-
Notifications
You must be signed in to change notification settings - Fork 10
GC race: concurrent addTempRoot ignored during deletion (port NixOS/nix#15469) #395
Description
Summary
Determinate Nix contains a GC race condition where deleteReferrersClosure can delete store paths that a concurrent evaluator is actively using via addTempRoot. This causes intermittent error: path '/nix/store/...' is not valid failures during Nix evaluation, particularly in CI environments with persistent Nix stores.
The fix is available upstream as NixOS/nix#15469 (by @domenkozar) but has not been ported to Determinate Nix yet.
Affected Code
Three locations need the fix (all present in nix-src as of current main):
1. src/libstore/gc.cc — deletion loop missing tempRoots re-check
The deleteReferrersClosure BFS phase checks tempRoots when visiting paths, but the deletion loop (for (auto & path : topoSortPaths(visited))) does not re-check before deleting. A concurrent addTempRoot that arrives after the BFS but before deletion is silently ignored:
// Current code (vulnerable) — around line 795:
for (auto & path : topoSortPaths(visited)) {
if (!dead.insert(path).second)
continue;
if (shouldDelete) {
try {
invalidatePathChecked(path); // ← no tempRoots re-check!
deleteFromStore(path.to_string());The fix adds a tempRoots re-check + pending synchronization before each deletion.
2. src/libexpr/eval-cache.cc — missing addTempRoot before isValidPath
// Line ~577: no addTempRoot before validity check
if (!path || !root->state.store->isValidPath(*path)) {3. src/libfetchers/fetchers.cc — missing addTempRoot before ensurePath
// Line ~380: no addTempRoot before ensurePath
store.ensurePath(*storePath);Symptoms
error: path '/nix/store/h9lc1dpi14z7is86ffhl3ld569138595-audit-tmpdir.sh' is not validduring evaluation- Affects stdenv setup hooks, nixpkgs patches, and other derivation inputs
- Flaky: ~5-15% CI failure rate on runners with persistent Nix stores
- Path IS available on
cache.nixos.organd CAN be fetched withnix-store --realise - Cannot be reproduced by simulating static store corruption — it's a timing-dependent race
Environment
- Determinate Nix 3.17.1 (Nix 2.33.3)
- Namespace.so Linux runners, GitHub-hosted Ubuntu runners, Namespace macOS runners
- Triggered during devenv shell evaluation (
derivationStrict)
References
- Upstream fix: NixOS/nix#15469
- devenv tracking: cachix/devenv#2530
- Our tracking: overengineeringstudio/effect-utils#201
Filed by an AI assistant on behalf of @schickling