Skip to content

Update Codex ACP to support most of Codex's latest features and improve overall UX#206

Draft
sam-ulrich1 wants to merge 10 commits intozed-industries:mainfrom
sam-ulrich1:main
Draft

Update Codex ACP to support most of Codex's latest features and improve overall UX#206
sam-ulrich1 wants to merge 10 commits intozed-industries:mainfrom
sam-ulrich1:main

Conversation

@sam-ulrich1
Copy link
Copy Markdown

Summary

This PR makes the ACP bridge more complete and more trustworthy for editor clients. The changes fall into four main buckets: session lifecycle support, ACP-native command handling, richer event/UI surfacing, and reliability hardening around long-running interactions.

The first few commits were made by Claude Opus 4.6 (via the claude acp in zed) to bootstrap past the annoying Codex ACP bugs I kept running into. After I got past the majority of the bugs I switched back to Codex using 5.4-xhigh for the remainder. This PR summary was written by Codex to make review as easy as possible. I apologize for the very large PR but there was a lot of stuff blocking my normal Codex use in zed so this brings it much closer to the current state. I am opening this first as a draft to get feedback and while I test drive it in Zed. I expect to find bugs and UX problems as I use it which I will fix before making this a real PR. This gives the maintainers the opportunity to fully reject this PR if they are not interested in it or to provide critiques that will allow me to patch before making a full PR attempt.

Many of the decisions were made cross referencing the codex repo, agent protocol sdk, and the implementation designs of the claude acp repo. There are some places where I opted for UX over SDK support like in the auto compaction handling. I did not want to have 2 outstanding PRs at once so that is definitely a hot point where the implementation diverged intentionally for better UX.

I have not tested every single fix in a real workflow yet which is a major reason this is starting as a draft. I will test them all but it will take me a few days.

Why

  • Several Codex protocol events were still ignored or only partially surfaced in ACP, which meant the UI could miss important state like hooks, image generation, sub-agent activity, rollback, and compaction.
  • Session handling still had a few editor-facing rough edges, especially around forked sessions, renamed threads, and list_sessions calls where the client omits cwd.
  • Some slash commands were going through awkward fallback paths instead of using structured ACP operations or ACP-side handling.
  • A few bridge paths could fail silently or panic under poisoned/orphaned conditions, which is the wrong failure mode for an adapter layer.

What Changed

  • Adds ACP session forking end-to-end by advertising fork capability during initialize and implementing fork_session() from stored rollout history.
  • Tightens session discovery so list_sessions now targets VS Code-backed sessions, falls back to the most recent known root when cwd is missing, normalizes paths before filtering, and prefers persisted thread names over first-message-derived titles.
  • Makes /diff ACP-native by running git locally and returning staged, unstaged, and untracked changes directly.
  • Makes /status ACP-native by formatting current session configuration locally instead of routing through the model.
  • Makes /rename use Op::SetThreadName and immediately confirms the rename back to the client.
  • Makes /mcp and /skills use structured backend ops and formats their async responses in ACP.
  • Adds /stop support for cleaning up background terminals.
  • Adds a first-class plan session mode backed by Codex collaboration mode rather than treating it like a normal approval preset.
  • Keeps the final command surface intentionally smaller by removing placeholder-style builtins like /mention, /feedback, and /debug-config from the branch’s final state.
  • Surfaces hook start/completion events into ACP output.
  • Surfaces image generation begin/end as tool-call lifecycle updates.
  • Surfaces raw reasoning text and incremental plan deltas.
  • Surfaces rollback, background, deprecation, and thread-title updates.
  • Improves sub-agent visibility by keeping spawn/interact/wait/close activity attached to a single in-progress tool call instead of scattered text.
  • Surfaces live auto-compaction progress in the active prompt UI so context compaction no longer happens “silently”.
  • Adds per-turn usage bookkeeping and enriches UsageUpdate notifications with USD cost when credits balance data is available.
  • Hardens orphaned tool-call handling so unexpected output/end events do not panic.
  • Supports multiple concurrent web searches instead of assuming only one active search.
  • Replaces silent drop(tx.send(...)) paths with logged handling.
  • Replaces poison-sensitive mutex unwrap() calls with recovery via into_inner().
  • Updates agent-client-protocol to the SDK main branch to match the unstable ACP surface used by these changes.

Review Guide

  1. Review src/codex_agent.rs first for capability advertisement, list_sessions cleanup, and fork_session().
  2. Review src/thread.rs next for slash-command flow, plan mode handling, and MCP/skills response routing.
  3. Review the larger src/thread.rs event-handling changes after that for reliability, usage updates, and sub-agent lifecycle surfacing.
  4. Review the auto-compaction routing last, since it builds on the same event plumbing but is conceptually separate.
  5. Review Cargo.toml and Cargo.lock at the end for the ACP SDK source change.

Commit Guide

  • 63cefc5 lays the reliability and event-handling foundation.
  • 9301035 adds session forking plus usage/cost plumbing.
  • 7e40fd8 expands coverage around the new flows and edge cases.
  • 2aa4948 refines session behavior and slash-command handling into the current ACP-facing shape.
  • 6104e28 finishes the UI story by surfacing live auto-compaction progress.

Testing

  • cargo test passes locally.
  • Current result: 60 passed, 0 failed.
  • Coverage now includes slash-command behavior, plan mode, orphaned tool-call regressions, concurrent web searches, usage/cost edge cases, sub-agent lifecycle events, and auto-compaction routing.

sam-ulrich1 and others added 5 commits March 24, 2026 08:43
P0 reliability fixes:
- Add warning logs for orphaned tool calls in exec_command_output_delta,
  exec_command_end, and terminal_interaction when call_id is not found
- Support multiple concurrent web searches by converting
  active_web_search from Option<String> to HashMap<String, ()>
- Replace all drop(tx.send(...)) with logged error handling throughout
  thread.rs (25+ call sites)
- Replace panicking .expect()/.unwrap() on mutexes with
  .unwrap_or_else(|e| e.into_inner()) in thread.rs, codex_agent.rs,
  and local_spawner.rs

P2 event surfacing (12 previously-ignored event types now handled):
- HookStarted/HookCompleted: surfaced as agent text with hook name and
  completion status
- ImageGenerationBegin/End: surfaced as tool call lifecycle
- AgentReasoningRawContent: forwarded as agent thought chunk
- ThreadRolledBack: surfaced as agent text with turn count
- BackgroundEvent: forwarded as agent text
- DeprecationNotice: forwarded as bold warning text with details
- PlanDelta: forwarded as agent thought chunk for incremental updates
- CollabAgentSpawnBegin/End: surfaced as tool call lifecycle
- CollabAgentInteractionBegin/End: surfaced as tool call lifecycle
- CollabWaitingBegin/End: surfaced as agent text status messages

P3 slash command additions (9 new commands):
- /diff: show git diff including untracked files
- /status: show session configuration and token usage
- /stop: Op::CleanBackgroundTerminals to stop background terminals
- /rename: rename current thread
- /mention: file mention as @path
- /feedback: collect and send diagnostic logs
- /mcp: list configured MCP tools
- /skills: list available skills
- /debug-config: show config layers for debugging

Tests: 16 new tests added (37 total, all passing):
- 5 slash command tests (stop, diff, status, rename, mention)
- 9 event surfacing tests (hooks, image gen, background, deprecation,
  rollback, reasoning, plan delta, collab spawn, collab interaction)
- 2 reliability tests (concurrent web searches, builtin command list)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fork session:
- Implement fork_session() in codex_agent.rs using codex-core's
  ThreadManager::fork_thread() API
- Advertise SessionForkCapabilities in initialize response
- Fork creates a new thread with the source session's full history,
  assigns a new session ID, and returns modes/models/config_options
- No SDK upgrade needed — the Rust ACP SDK 0.10.2 already has
  ForkSessionRequest/Response behind the "unstable" feature flag

Usage accumulation:
- Add AccumulatedUsage struct to PromptState tracking input_tokens,
  output_tokens, cached_input_tokens, and reasoning_output_tokens
- Accumulate from TokenCountEvent.info.last_token_usage on each event
- Enrich UsageUpdate notifications with Cost when credits balance is
  available from rate_limits.credits.balance
- Cost is reported in USD using the ACP SDK's Cost type

Tests: 1 new test (38 total, all passing):
- test_usage_accumulation_with_cost: verifies two TokenCount events
  produce two UsageUpdate notifications with correct context window
  size, cost attached to first (with credits), no cost on second

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
18 new tests (56 total) covering every gap identified in the
coverage audit. All tests pass with zero warnings.

Missing slash command tests (4):
- test_slash_feedback: verifies Op and notification
- test_slash_mcp: verifies Op and notification
- test_slash_skills: verifies Op and notification
- test_slash_debug_config: verifies Op and notification

Edge case tests for commands requiring arguments (3):
- test_slash_rename_no_arg_falls_through: /rename with no arg
  falls through to default handler producing UserInput
- test_slash_mention_no_arg_falls_through: same for /mention
- test_slash_review_branch_no_arg_falls_through: same for
  /review-branch

Orphaned tool call regression test (1):
- test_orphaned_tool_call_events_handled_gracefully: sends
  ExecCommandOutputDelta, ExecCommandEnd, and
  TerminalInteraction for a call_id that was never registered
  via ExecCommandBegin — verifies no panic, clean completion

CollabWaiting event test (1):
- test_collab_waiting_events_surfaced: verifies exact message
  text for both WaitingBegin and WaitingEnd

Usage accumulation edge cases (3):
- test_token_count_with_none_info: TokenCount with info=None
  sends no UsageUpdate notification
- test_token_count_with_no_context_window: TokenCount with
  model_context_window=None sends no UsageUpdate
- test_token_count_with_unparseable_balance: credits balance
  "not-a-number" gracefully results in no cost field

Tightened event notification assertions (5):
- test_hook_events_exact_format: verifies hook started/completed
  message format including hook id and status message
- test_image_generation_exact_content: verifies ToolCall title,
  kind (Other), status (InProgress), and completed update content
- test_background_event_exact_format: exact string match
- test_deprecation_notice_exact_format: verifies markdown bold
  format with summary and details
- test_thread_rollback_exact_format: verifies plural "turns"

Slash command notification verification (1):
- test_slash_commands_generate_notifications: parametrized test
  covering all 8 UserInput-based slash commands, verifying both
  the Op text content and that AgentMessageChunk notifications
  are actually sent

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sam-ulrich1
Copy link
Copy Markdown
Author

One important note, I made this patch so I could use Codex in Zed the way I use it in the terminal. If this PR is too big, ugly, or just not the direction y'all want things to go that is fine. No hard feelings. If y'all do want it I am willing to put in effort to bring it up to spec

@Kareemkiller
Copy link
Copy Markdown

Am sorry for changing too much accounts the problems by fore getting emaill before. Now needed add all email in one application

@sam-ulrich1
Copy link
Copy Markdown
Author

Am sorry for changing too much accounts the problems by fore getting emaill before. Now needed add all email in one application

What???

@sam-ulrich1
Copy link
Copy Markdown
Author

Maintainers, are y'all interested in this or should I close? If it makes any difference to you it took Codex in Zed from practically unusable to my daily driver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants