Skip to content

fix: avoid ACP apply_patch deadlock#214

Closed
osolmaz wants to merge 1 commit into
zed-industries:mainfrom
osolmaz:codex/fix-apply-patch-acp-deadlock
Closed

fix: avoid ACP apply_patch deadlock#214
osolmaz wants to merge 1 commit into
zed-industries:mainfrom
osolmaz:codex/fix-apply-patch-acp-deadlock

Conversation

@osolmaz
Copy link
Copy Markdown

@osolmaz osolmaz commented Mar 29, 2026

Clanker generated report, let me know if it's not clear enough:

Summary

  • move the ACP I/O loop onto its own dedicated thread/runtime instead of running it on the same local executor as tool execution
  • add a regression test for apply_patch verification over ACP-backed filesystem reads
  • keep the fix narrow to the local ACP runtime path

Reliable repro

This PR now includes a deterministic repro in src/local_spawner.rs:

  • test: apply_patch_verification_does_not_deadlock_over_acp_fs
  • child entrypoint: apply_patch_verification_child

What the repro does:

  1. Start a fake ACP client/server pair over pipes.
  2. Serve file contents through AcpFs rather than the normal local filesystem.
  3. Call codex_apply_patch::maybe_parse_apply_patch_verified(...) on a patch that updates an existing file.
  4. Fail the parent test if the child does not finish within 5 seconds.

Why this is the right repro:

  • it does not depend on acpx
  • it exercises the actual problematic path in codex-acp
  • it uses the same ACP-backed file-read roundtrip that apply_patch uses in real sessions

Behavior:

  • before this fix: the child times out with apply_patch ACP fs roundtrip deadlocked
  • after this fix: the child exits successfully and the test passes

User-visible symptom

When codex-acp is connected to an ACP host that serves file contents over ACP, an apply_patch call that needs to verify an existing file can hang forever.

From the host side, the symptom looks like this:

  1. codex-acp requests fs/read_text_file.
  2. The host replies successfully with the file contents.
  3. codex-acp never finishes the apply_patch turn.
  4. No patch is applied, and the session appears stuck at the tool boundary.

That is the behavior I first saw in the wild. The host-specific details are not important to the bug; the important part is that ACP FS reads were working, but codex-acp stopped making progress after the response.

Root cause

ApplyPatchHandler uses codex_apply_patch::maybe_parse_apply_patch_verified(...), which can synchronously call back into AcpFs::read_to_string() while verifying the patch against current file contents.

Before this change, the agent-side ACP I/O loop was running on the same local executor as tool execution. That means:

  • the tool handler can block waiting for an ACP FS response
  • but the ACP I/O task that must receive and deliver that response is sharing the same execution lane
  • so the tool can end up waiting on work that the runtime is no longer able to drive

In practice, that deadlocks apply_patch verification at the ACP filesystem boundary.

Fix

Run the ACP I/O loop on its own dedicated OS thread with its own Tokio runtime instead of awaiting it directly on the main LocalSet.

That keeps ACP request/response handling alive even while a tool handler is inside a blocking ACP FS roundtrip.

Validation

  • cargo test apply_patch_verification_does_not_deadlock_over_acp_fs -- --nocapture
  • cargo test
  • cargo fmt --check

@benbrandt
Copy link
Copy Markdown
Member

Error case should be bypassed in current main. Will do a release shortly

@benbrandt benbrandt closed this Mar 31, 2026
@osolmaz osolmaz deleted the codex/fix-apply-patch-acp-deadlock branch April 4, 2026 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants