Skip to content

ShellSnapshot refresh can delete the active snapshot file via Drop on the previous generation #14906

@nantas

Description

@nantas

What version of Codex CLI is running?

codex-cli 0.115.0

What subscription do you have?

Unknown / source-level investigation

Which model were you using?

N/A (source-level investigation)

What platform is your computer?

Darwin 25.3.0 arm64 arm

What terminal emulator and version are you using (if applicable)?

ghostty

What issue are you seeing?

During source-level investigation of multi-agent and shell environment behavior, I found a likely bug in ShellSnapshot lifecycle management.

ShellSnapshot::try_new always finalizes the snapshot to a stable per-session path like:

$CODEX_HOME/shell_snapshots/{session_id}.sh

Later, ShellSnapshot::refresh_snapshot creates a new snapshot and sends the new Arc<ShellSnapshot> through the watch channel. The previous ShellSnapshot instance is then dropped, and Drop unconditionally deletes self.path.

Because both generations use the same final path, the old generation can delete the newly refreshed snapshot file.

Relevant source references:

  • codex-rs/core/src/shell_snapshot.rs
    • stable final path: lines around 123-125
    • refresh publishes new snapshot: lines around 90-106
    • Drop removes self.path: lines around 183-190
  • codex-rs/core/src/tools/runtimes/mod.rs
    • shell wrapper silently no-ops when snapshot.path.exists() is false: lines around 78-84

This means snapshot loss can become silent runtime drift instead of an explicit failure.

I searched for similar issues before opening this one. The closest one I found was #10883, but that issue is about shell incompatibility when sourcing a snapshot, not about refresh/delete of the active snapshot file.

What steps can reproduce the bug?

This looks reproducible from source logic alone:

  1. Start a session with shell snapshots enabled.
  2. Ensure a snapshot exists at $CODEX_HOME/shell_snapshots/{session_id}.sh.
  3. Trigger a snapshot refresh for the same session.
  4. Let the new generation replace the old generation in the watch channel.
  5. Observe that the previous ShellSnapshot's Drop implementation removes the same stable file path.

A concrete runtime repro should be possible by:

  1. Starting a CLI session with shell snapshots enabled.
  2. Causing an initial snapshot to be created.
  3. Triggering a refresh path for the same session.
  4. Checking whether the active snapshot file disappears afterward.
  5. Running a shell command that would normally be wrapped with snapshot sourcing and observing that Codex silently stops sourcing the snapshot.

What is the expected behavior?

Refreshing a shell snapshot should not delete the active snapshot file for the session.

A new generation should either:

  • use a distinct final filename, or
  • only delete files it can prove belong to its own generation, or
  • otherwise avoid deleting the current active shared path from Drop.

Additional information

Why this matters:

  • It can explain intermittent shell environment drift after refresh.
  • It is especially confusing because the wrapper path silently falls back when snapshot.path.exists() is false.
  • This may affect both parent sessions and subagents that inherit a snapshot handle.

Potential fix directions:

  • stop deleting the shared final path from Drop
  • use generation-specific filenames
  • add a regression test that refreshing a snapshot does not remove the newly published active file

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentIssues related to the core agent loopbugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions