Skip to content

Update#3

Merged
shahbajlive merged 103 commits intoshahbajlive:mainfrom
openai:main
Mar 20, 2026
Merged

Update#3
shahbajlive merged 103 commits intoshahbajlive:mainfrom
openai:main

Conversation

@shahbajlive
Copy link
Copy Markdown
Owner

External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.

jif-oai and others added 30 commits March 17, 2026 12:03
)

## Summary
- make `report_agent_job_result` atomically transition an item from
running to completed while storing `result_json`
- remove brittle finalization grace-sleep logic and make finished-item
cleanup idempotent
- replace blind fixed-interval waiting with status-subscription-based
waiting for active worker threads
- add state runtime tests for atomic completion and late-report
rejection

## Why
This addresses the race and polling concerns in #13948 by removing
timing-based correctness assumptions and reducing unnecessary status
polling churn.

## Validation
- `cd codex-rs && just fmt`
- `cd codex-rs && cargo test -p codex-state`
- `cd codex-rs && cargo test -p codex-core --test all suite::agent_jobs`
- `cd codex-rs && cargo test`
- fails in an unrelated app-server tracing test:
`message_processor::tracing_tests::thread_start_jsonrpc_span_exports_server_span_and_parents_children`
timed out waiting for response

## Notes
- This PR supersedes #14129 with the same agent-jobs fix on a clean
branch from `main`.
- The earlier PR branch was stacked on unrelated history, which made the
review diff include unrelated commits.

Fixes #13948
Show effective model after the full config layering for the sub agent
…#14838)

## Description

This PR fixes a bad first-turn failure mode in app-server when the
startup websocket prewarm hangs. Before this change, `initialize ->
thread/start -> turn/start` could sit behind the prewarm for up to five
minutes, so the client would not see `turn/started`, and even
`turn/interrupt` would block because the turn had not actually started
yet.

Now, we:
- set a (configurable) timeout of 15s for websocket startup time,
exposed as `websocket_startup_timeout_ms` in config.toml
- `turn/started` is sent immediately on `turn/start` even if the
websocket is still connecting
- `turn/interrupt` can be used to cancel a turn that is still waiting on
the websocket warmup
- the turn task will wait for the full 15s websocket warming timeout
before falling back

## Why

The old behavior made app-server feel stuck at exactly the moment the
client expects turn lifecycle events to start flowing. That was
especially painful for external clients, because from their point of
view the server had accepted the request but then went silent for
minutes.

## Configuring the websocket startup timeout
Can set it in config.toml like this:
```
[model_providers.openai]
supports_websockets = true
websocket_connect_timeout_ms = 15000
```
…14859)

### Summary
The goal is for us to get the latest turn model and reasoning effort on
thread/resume is no override is provided on the thread/resume func call.
This is the part 1 which we write the model and reasoning effort for a
thread to the sqlite db and there will be a followup PR to consume the
two new fields on thread/resume.

[part 2 PR is currently WIP](#14888)
and this one can be merged independently.
## Problem

When the TUI connects to a **remote** app-server (via WebSocket), resume
and fork operations lost all conversation history.
`AppServerStartedThread` carried only the `SessionConfigured` event, not
the full `Thread` snapshot. After resume or fork, the chat transcript
was empty — prior turns were silently discarded.

A secondary issue: `primary_session_configured` was not cleared on
reset, causing stale session state after reconnection.

## Approach: TUI-side only, zero app-server changes

The app-server **already returns** the full `Thread` object (with
populated `turns: Vec<Turn>`) in its `ThreadStartResponse`,
`ThreadResumeResponse`, and `ThreadForkResponse`. The data was always
there — the TUI was simply throwing it away. The old
`AppServerStartedThread` struct only kept the `SessionConfiguredEvent`,
discarding the rich turn history that the server had already provided.

This PR fixes the problem entirely within `tui_app_server` (3 files
changed, 0 changes to `app-server`, `app-server-protocol`, or any other
crate). Rather than modifying the server to send history in a different
format or adding a new endpoint, the fix preserves the existing `Thread`
snapshot and replays it through the TUI's standard event pipeline —
making restored sessions indistinguishable from live ones.

## Solution

Add a **thread snapshot replay** path. When the server hands back a
`Thread` object (on start, resume, or fork),
`restore_started_app_server_thread` converts its historical turns into
the same core `Event` sequence the TUI already processes for live
interactions, then replays them into the event store so the chat widget
renders them.

Key changes:
- **`AppServerStartedThread` now carries the full `Thread`** —
`started_thread_from_{start,resume,fork}_response` clone the thread into
the struct alongside the existing `SessionConfiguredEvent`.
- **`thread_snapshot_events()`** walks the thread's turns and items,
producing `TurnStarted` → `ItemCompleted`* →
`TurnComplete`/`TurnAborted` event sequences that the TUI already knows
how to render.
- **`restore_started_app_server_thread()`** pushes the session event +
history events into the thread channel's store, activates the channel,
and replays the snapshot — used for initial startup, resume, and fork.
- **`primary_session_configured` cleared on reset** to prevent stale
session state after reconnection.

## Tradeoffs

- **`Thread` is cloned into `AppServerStartedThread`**: The full thread
snapshot (including all historical turns) is cloned at startup. For
long-lived threads this could be large, but it's a one-time cost and
avoids lifetime gymnastics with the response.

## Tests

- `restore_started_app_server_thread_replays_remote_history` —
end-to-end: constructs a `Thread` with one completed turn, restores it,
and asserts user/agent messages appear in the transcript.
- `bridges_thread_snapshot_turns_for_resume_restore` — unit: verifies
`thread_snapshot_events` produces the correct event sequence for
completed and interrupted turns.

## Test plan

- [ ] Verify `cargo check -p codex-tui-app-server` passes
- [ ] Verify `cargo test -p codex-tui-app-server` passes
- [ ] Manual: connect to a remote app-server, resume an existing thread,
confirm history renders in the chat widget
- [ ] Manual: fork a thread via remote, confirm prior turns appear
## What is flaky
`codex-rs/app-server/tests/suite/fuzzy_file_search.rs` intermittently
loses the expected `fuzzyFileSearch/sessionUpdated` and
`fuzzyFileSearch/sessionCompleted` notifications when multiple
fuzzy-search sessions are active and CI delivers notifications out of
order.

## Why it was flaky
The wait helpers were keyed only by JSON-RPC method name.

- `wait_for_session_updated` consumed the next
`fuzzyFileSearch/sessionUpdated` notification even when it belonged to a
different search session.
- `wait_for_session_completed` did the same for
`fuzzyFileSearch/sessionCompleted`.
- Once an unmatched notification was read, it was dropped permanently
instead of buffered.
- That meant a valid completion for the target search could arrive
slightly early, be consumed by the wrong waiter, and disappear before
the test started waiting for it.

The result depended on notification ordering and runner scheduling
instead of on the actual product behavior.

## How this PR fixes it
- Add a buffered notification reader in
`codex-rs/app-server/tests/common/mcp_process.rs`.
- Match fuzzy-search notifications on the identifying payload fields
instead of matching only on method name.
- Preserve unmatched notifications in the in-process queue so later
waiters can still consume them.
- Include pending notification methods in timeout failures to make
future diagnosis concrete.

## Why this fix fixes the flakiness
The test now behaves like a real consumer of an out-of-order event
stream: notifications for other sessions stay buffered until the correct
waiter asks for them. Reordering no longer loses the target event, so
the test result is determined by whether the server emitted the right
notifications, not by which one happened to be read first.

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
### Why
i'm working on something that parses and analyzes codex rollout logs,
and i'd like to have a schema for generating a parser/validator.

`codex app-server generate-internal-json-schema` writes an
`RolloutLine.json` file

while doing this, i noticed we have a writer <> reader mismatch issue on
`FunctionCallOutputPayload` and reasoning item ID -- added some schemars
annotations to fix those

### Test

```
$ just codex app-server generate-internal-json-schema --out ./foo
```

generates an `RolloutLine.json` file, which i validated against jsonl
files on disk

`just codex app-server --help` doesn't expose the
`generate-internal-json-schema` option by default, but you can do `just
codex app-server generate-internal-json-schema --help` if you know the
command

everything else still works

---------

Co-authored-by: Codex <noreply@openai.com>
## Summary
This is PR 2 of the Windows sandbox runner split.

PR 1 introduced the framed IPC runner foundation and related Windows
sandbox infrastructure without changing the active elevated one-shot
execution path. This PR switches that elevated one-shot path over to the
new runner IPC transport and removes the old request-file bootstrap that
PR 1 intentionally left in place.

After this change, ordinary elevated Windows sandbox commands still
behave as one-shot executions, but they now run as the simple case of
the same helper/IPC transport that later unified_exec work will build
on.

## Why this is needed for unified_exec
Windows elevated sandboxed execution crosses a user boundary: the CLI
launches a helper as the sandbox user and has to manage command
execution from outside that security context. For one-shot commands, the
old request-file/bootstrap flow was sufficient. For unified_exec, it is
not.

Unified_exec needs a long-lived bidirectional channel so the parent can:
- send a spawn request
- receive structured spawn success/failure
- stream stdout and stderr incrementally
- eventually support stdin writes, termination, and other session
lifecycle events

This PR does not add long-lived sessions yet. It converts the existing
elevated one-shot path to use the same framed IPC transport so that PR 3
can add unified_exec session semantics on top of a transport that is
already exercised by normal elevated command execution.

## Scope
This PR:
- updates `windows-sandbox-rs/src/elevated_impl.rs` to launch the runner
with named pipes, send a framed `SpawnRequest`, wait for `SpawnReady`,
and collect framed `Output`/`Exit` messages
- removes the old `--request-file=...` execution path from
`windows-sandbox-rs/src/elevated/command_runner_win.rs`
- keeps the public behavior one-shot: no session reuse or interactive
unified_exec behavior is introduced here

This PR does not:
- add Windows unified_exec session support
- add background terminal reuse
- add PTY session lifecycle management

## Why Windows needs this and Linux/macOS do not
On Linux and macOS, the existing sandbox/process model composes much
more directly with long-lived process control. The parent can generally
spawn and own the child process (or PTY) directly inside the sandbox
model we already use.

Windows elevated sandboxing is different. The parent is not directly
managing the sandboxed process in the same way; it launches across a
different user/security context. That means long-lived control requires
an explicit helper process plus IPC for spawn, output, exit, and later
stdin/session control.

So the extra machinery here is not because unified_exec is conceptually
different on Windows. It is because the elevated Windows sandbox
boundary requires a helper-mediated transport to support it cleanly.

## Validation
- `cargo test -p codex-windows-sandbox`
#14952)

## Summary
- add device-code ChatGPT sign-in to `tui_app_server` onboarding and
reuse the existing `chatgptAuthTokens` login path
- fall back to browser login when device-code auth is unavailable on the
server
- treat `ChatgptAuthTokens` as an existing signed-in ChatGPT state
during onboarding
- add a local ChatGPT auth loader for handing local tokens to the app
server and serving refresh requests
- handle `account/chatgptAuthTokens/refresh` instead of marking it
unsupported, including workspace/account mismatch checks
- add focused coverage for onboarding success, existing auth handling,
local auth loading, and refresh request behavior

## Testing
- `cargo test -p codex-tui-app-server`
- `just fix -p codex-tui-app-server`
It now supports:

- Connectors that are from installed and enabled plugins that are not
installed yet
- Plugins that are on the allowlist that are not installed yet.
## What is flaky
The Windows shell-driven integration tests in `codex-rs/core` were
intermittently unstable, especially:

- `apply_patch_cli_can_use_shell_command_output_as_patch_input`
- `websocket_test_codex_shell_chain`
- `websocket_v2_test_codex_shell_chain`

## Why it was flaky
These tests were exercising real shell-tool flows through whichever
shell Codex selected on Windows, and the `apply_patch` test also nested
a PowerShell read inside `cmd /c`.

There were multiple independent sources of nondeterminism in that setup:

- The test harness depended on the model-selected Windows shell instead
of pinning the shell it actually meant to exercise.
- `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI
that could leave the read command wrapped as a literal string instead of
executing it.
- Even after getting the quoting right, PowerShell could emit CLIXML
progress records like module-initialization output onto stdout.
- The `apply_patch` test was building a patch directly from shell
stdout, so any quoting artifact or progress noise corrupted the patch
input.

So the failures were driven by shell startup and output-shape variance,
not by the `apply_patch` or websocket logic themselves.

## How this PR fixes it
- Add a test-only `user_shell_override` path so Windows integration
tests can pin `cmd.exe` explicitly.
- Use that override in the websocket shell-chain tests and in the
`apply_patch` harness.
- Change the nested Windows file read in
`apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8
PowerShell `-EncodedCommand` script.
- Run that nested PowerShell process with `-NonInteractive`, set
`$ProgressPreference = 'SilentlyContinue'`, and read the file with
`[System.IO.File]::ReadAllText(...)`.

## Why this fix fixes the flakiness
The outer harness now runs under a deterministic shell, and the inner
PowerShell read no longer depends on fragile `cmd` quoting or on
progress output staying quiet by accident. The shell tool returns only
the file contents, so patch construction and websocket assertions depend
on stable test inputs instead of on runner-specific shell behavior.

---------

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
## What is flaky
The permissions popup tests in the TUI are flaky, especially on Windows.
They assume the popup opens on a specific row and that a fixed number of
`Up` or `Down` keypresses will land on a specific preset. They also
match popup text too loosely, so a non-selected row can satisfy the
assertion.

## Why it was flaky
These tests were asserting incidental rendering details rather than the
actual selected permission preset. On Windows, the initial selection can
differ from non-Windows runs. Some tests also searched the entire popup
for text like `Guardian Approvals` or `(current)`, which can match a row
that is visible but not selected. Once the popup order or current preset
shifted slightly, a test could fail even though the UI behavior was
still correct.

## How this PR fixes it
This PR adds helpers that identify the selected popup row and selected
preset name directly. The tests now assert the current selection by
name, navigate to concrete target presets instead of assuming a fixed
number of keypresses, and explicitly set the reviewer state in the cases
that require `Guardian Approvals` to be current.

## Why this fix fixes the flakiness
The assertions now track semantic state, not fragile text placement.
Navigation is target-based instead of order-based, so
Windows/non-Windows row differences and harmless popup layout changes no
longer break the tests. That removes the scheduler- and
platform-sensitive assumptions that made the popup suite intermittent.

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
## What is flaky
The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in
CI even though the approval logic is unchanged, because the test
delegates the file write and readback to shell parsing instead of
deterministic file I/O.

## Why it was flaky
The test generated a command shaped like `printf ... > file && cat
file`. That means the scenario depended on shell quoting, redirection,
newline handling, and encoding behavior in addition to the approval
system it was actually trying to validate. If the shell interpreted the
payload differently, the test would report an approval failure even
though the product logic was fine.

That also made failures hard to diagnose, because the test did not log
the exact generated command or the parsed result payload.

## How this PR fixes it
This PR replaces the shell-redirection path with a deterministic
`python3 -c` script that writes the file with `Path.write_text(...,
encoding='utf-8')` and then reads it back with the same UTF-8 path. It
also logs the generated command and the resulting exit code/stdout for
the approval scenario so any future failure is directly attributable.

## Why this fix fixes the flakiness
The scenario no longer depends on shell parsing and redirection
semantics. The file contents are produced and read through explicit
UTF-8 file I/O, so the approval test is measuring approval behavior
instead of shell behavior. The added diagnostics mean a future failure
will show the exact command/result pair instead of looking like a
generic intermittent mismatch.

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
Summary
- document that code mode only exposes `exec` and the renamed `wait`
tool
- update code mode tool spec and descriptions to match the new tool name
- rename tests and helper references from `exec_wait` to `wait`

Testing
- Not run (not requested)
CXC-410 Emit Env Var Status with `/feedback` report

Add more observability on top of #14611 

[Unset](https://openai.sentry.io/issues/7340419168/?project=4510195390611458&query=019cfa8d-c1ba-7002-96fa-e35fc340551d&referrer=issue-stream)

[Set](https://openai.sentry.io/issues/7340426331/?project=4510195390611458&query=019cfa91-aba1-7823-ab7e-762edfbc0ed4&referrer=issue-stream)
<img width="1063" height="610" alt="image"
src="https://github.com/user-attachments/assets/937ab026-1c2d-4757-81d5-5f31b853113e"
/>


###### Summary
- Adds auth-env telemetry that records whether key auth-related env
overrides were present on session start and request paths.
- Threads those auth-env fields through `/responses`, websocket, and
`/models` telemetry and feedback metadata.
- Buckets custom provider `env_key` configuration to a safe
`"configured"` value instead of emitting raw config text.
- Keeps the slice observability-only: no raw token values or raw URLs
are emitted.

###### Rationale (from spec findings)
- 401 and auth-path debugging needs a way to distinguish env-driven auth
paths from sessions with no auth env override.
- Startup and model-refresh failures need the same auth-env diagnostics
as normal request failures.
- Feedback and Sentry tags need the same auth-env signal as OTel events
so reports can be triaged consistently.
- Custom provider config is user-controlled text, so the telemetry
contract must stay presence-only / bucketed.

###### Scope
- Adds a small `AuthEnvTelemetry` bundle for env presence collection and
threads it through the main request/session telemetry paths.
- Does not add endpoint/base-url/provider-header/geo routing attribution
or broader telemetry API redesign.

###### Trade-offs
- `provider_env_key_name` is bucketed to `"configured"` instead of
preserving the literal configured env var name.
- `/models` is included because startup/model-refresh auth failures need
the same diagnostics, but broader parity work remains out of scope.
- This slice keeps the existing telemetry APIs and layers auth-env
fields onto them rather than redesigning the metadata model.

###### Client follow-up
- Add the separate endpoint/base-url attribution slice if routing-source
diagnosis is still needed.
- Add provider-header or residency attribution only if auth-env presence
proves insufficient in real reports.
- Revisit whether any additional auth-related env inputs need safe
bucketing after more 401 triage data.

###### Testing
- `cargo test -p codex-core emit_feedback_request_tags -- --nocapture`
- `cargo test -p codex-core
collect_auth_env_telemetry_buckets_provider_env_key_name -- --nocapture`
- `cargo test -p codex-core
models_request_telemetry_emits_auth_env_feedback_tags_on_failure --
--nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_api_request_auth_observability --
--nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_websocket_connect_auth_observability
-- --nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_websocket_request_transport_observability
-- --nocapture`
- `cargo test -p codex-core --no-run --message-format short`
- `cargo test -p codex-otel --no-run --message-format short`

---------

Co-authored-by: Codex <noreply@openai.com>
## Problem

The `/mcp` command did not work in the app-server TUI (remote mode). On
`main`, `add_mcp_output()` called `McpManager::effective_servers()`
in-process, which only sees locally configured servers, and then emitted
a generic stub message for the app-server to handle. In remote usage,
that left `/mcp` without a real inventory view.

## Solution

Implement `/mcp` for the app-server TUI by fetching MCP server inventory
directly from the app-server via the paginated `mcpServerStatus/list`
RPC and rendering the results into chat history.

The command now follows a three-phase lifecycle:

1. Loading: `ChatWidget::add_mcp_output()` inserts a transient
`McpInventoryLoadingCell` and emits `AppEvent::FetchMcpInventory`. This
gives immediate feedback that the command registered.
2. Fetch: `App::fetch_mcp_inventory()` spawns a background task that
calls `fetch_all_mcp_server_statuses()` over an app-server request
handle. When the RPC completes, it sends `AppEvent::McpInventoryLoaded {
result }`.
3. Resolve: `App::handle_mcp_inventory_result()` clears the loading cell
and renders either `new_mcp_tools_output_from_statuses(...)` or an error
message.

This keeps the main app event loop responsive, so the TUI can repaint
before the remote RPC finishes.

## Notes

- No `app-server` changes were required.
- The rendered inventory includes auth, tools, resources, and resource
templates, plus transport details when they are available from local
config for display enrichment.
- The app-server RPC does not expose authoritative `enabled` or
`disabled_reason` state for MCP servers, so the remote `/mcp` view no
longer renders a `Status:` row rather than guessing from local config.
- RPC failures surface in history as `Failed to load MCP inventory:
...`.

## Tests

- `slash_mcp_requests_inventory_via_app_server`
- `mcp_inventory_maps_prefix_tool_names_by_server`
- `handle_mcp_inventory_result_clears_committed_loading_cell`
- `mcp_tools_output_from_statuses_renders_status_only_servers`
- `mcp_inventory_loading_snapshot`
Remote skills/remote/xxx as they are not in used for now.
- thread the realtime version into conversation start and app-server
notifications
- keep playback-aware mic gating and playback interruption behavior on
v2 only, leaving v1 on the legacy path
- route realtime startup, input, and transport failures through a single
shutdown path
- emit one realtime error/closed lifecycle while clearing session state
once

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
## Problem
Ubuntu/AppArmor hosts started failing in the default Linux sandbox path
after the switch to vendored/default bubblewrap in `0.115.0`.

The clearest report is in
[#14919](#14919), especially [this
investigation
comment](#14919 (comment)):
on affected Ubuntu systems, `/usr/bin/bwrap` works, but a copied or
vendored `bwrap` binary fails with errors like `bwrap: setting up uid
map: Permission denied` or `bwrap: loopback: Failed RTM_NEWADDR:
Operation not permitted`.

The root cause is Ubuntu's `/etc/apparmor.d/bwrap-userns-restrict`
profile, which grants `userns` access specifically to `/usr/bin/bwrap`.
Once Codex started using a vendored/internal bubblewrap path, that path
was no longer covered by the distro AppArmor exception, so sandbox
namespace setup could fail even when user namespaces were otherwise
enabled and `uidmap` was installed.

## What this PR changes
- prefer system `/usr/bin/bwrap` whenever it is available
- keep vendored bubblewrap as the fallback when `/usr/bin/bwrap` is
missing
- when `/usr/bin/bwrap` is missing, surface a Codex startup warning
through the app-server/TUI warning path instead of printing directly
from the sandbox helper with `eprintln!`
- use the same launcher decision for both the main sandbox execution
path and the `/proc` preflight path
- document the updated Linux bubblewrap behavior in the Linux sandbox
and core READMEs

## Why this fix
This still fixes the Ubuntu/AppArmor regression from
[#14919](#14919), but it keeps the
runtime rule simple and platform-agnostic: if the standard system
bubblewrap is installed, use it; otherwise fall back to the vendored
helper.

The warning now follows that same simple rule. If Codex cannot find
`/usr/bin/bwrap`, it tells the user that it is falling back to the
vendored helper, and it does so through the existing startup warning
plumbing that reaches the TUI and app-server instead of low-level
sandbox stderr.

## Testing
- `cargo test -p codex-linux-sandbox`
- `cargo test -p codex-app-server --lib`
- `cargo test -p codex-tui-app-server
tests::embedded_app_server_start_failure_is_returned`
- `cargo clippy -p codex-linux-sandbox --all-targets`
- `cargo clippy -p codex-app-server --all-targets`
- `cargo clippy -p codex-tui-app-server --all-targets`
## TL;DR
WIP esp the examples

Thin the Python SDK public surface so the wrapper layer returns
canonical app-server generated models directly.

- keeps `Codex` / `AsyncCodex` / `Thread` / `Turn` and input helpers,
but removes alias-only type layers and custom result models
- `metadata` now returns `InitializeResponse` and `run()` returns the
generated app-server `Turn`
- updates docs, examples, notebook, and tests to use canonical generated
types and regenerates `v2_all.py` against current schema
- keeps the pinned runtime-package integration flow and real integration
coverage

  ## Validation
  - `PYTHONPATH=sdk/python/src python3 -m pytest sdk/python/tests`
- `GH_TOKEN="$(gh auth token)" RUN_REAL_CODEX_TESTS=1
PYTHONPATH=sdk/python/src python3 -m pytest sdk/python/tests -rs`

---------

Co-authored-by: Codex <noreply@openai.com>
…14993)

- Add shared Product support to marketplace plugin policy and skill
policy (no enforced yet).
- Move marketplace installation/authentication under policy and model it
as MarketplacePluginPolicy.
- Rename plugin/marketplace local manifest types to separate raw serde
shapes from resolved in-memory models.
Reject websocket requests that carry an `Origin` header
Adds an environment crate and environment + file system abstraction.

Environment is a combination of attributes and services specific to
environment the agent is connected to:
File system, process management, OS, default shell.

The goal is to move most of agent logic that assumes environment to work
through the environment abstraction.
## Summary
- stop `codex sandbox` from forcing legacy `sandbox_mode` when active
`[permissions]` profiles are configured
- keep the legacy `read-only` / `workspace-write` fallback for legacy
configs and reject `--full-auto` for profile-based configs
- use split filesystem and network policies in the macOS/Linux debug
sandbox helpers and add regressions for the config-loading behavior


assuming "codex/docs/private/secret.txt" = "none"
```
codex -c 'default_permissions="limited-read-test"' sandbox macos -- <command> ...

codex sandbox macos -- cat codex/docs/private/secret.txt >/dev/null; echo EXIT:$?
cat: codex/docs/private/secret.txt: Operation not permitted
EXIT:1
```

---------

Co-authored-by: celia-oai <celia@openai.com>
pakrym-oai and others added 28 commits March 19, 2026 18:25
Add a config and attempt to start the server.
## Why

To date, the argument-comment linter introduced in
#14651 had to be built from source
to run, which can be a bit slow (both for local dev and when it is run
in CI). Because of the potential slowness, I did not wire it up to run
as part of `just clippy` or anything like that. As a result, I have seen
a number of occasions where folks put up PRs that violate the lint, see
it fail in CI, and then have to put up their PR again.

The goal of this PR is to pre-build a runnable version of the linter and
then make it available via a DotSlash file. Once it is available, I will
update `just clippy` and other touchpoints to make it a natural part of
the dev cycle so lint violations should get flagged _before_ putting up
a PR for review.

To get things started, we will build the DotSlash file as part of an
alpha release. Though I don't expect the linter to change often, so I'll
probably change this to only build as part of mainline releases once we
have a working DotSlash file. (Ultimately, we should probably move the
linter into its own repo so it can have its own release cycle.)

## What Changed
- add a reusable `rust-release-argument-comment-lint.yml` workflow that
builds host-specific archives for macOS arm64, Linux arm64/x64, and
Windows x64
- wire `rust-release.yml` to publish the `argument-comment-lint`
DotSlash manifest on all releases for now, including alpha tags
- package a runnable layout instead of a bare library

The Unix archive layout is:

```text
argument-comment-lint/
  bin/
    argument-comment-lint
    cargo-dylint
  lib/
    libargument_comment_lint@nightly-2025-09-18-<target>.dylib|so
```

On Windows the same layout is published as a `.zip`, with `.exe` and
`.dll` filenames instead.

DotSlash resolves the package entrypoint to
`argument-comment-lint/bin/argument-comment-lint`. That runner finds the
sibling bundled `cargo-dylint` binary plus the single packaged Dylint
library under `lib/`, then invokes `cargo-dylint dylint --lib-path
<that-library>` with the repo's default lint settings.
Stacked PR 2/3, based on the stub PR.

Adds the exec RPC implementation and process/event flow in exec-server
only.

---------

Co-authored-by: Codex <noreply@openai.com>
## Summary

- log guardian-reviewed tool approvals as `source=automated_reviewer` in
`codex.tool_decision`
- keep direct user approvals as `source=user` and config-driven
approvals as `source=config`

## Testing

-
`/Users/gabec/.codex/skills/codex-oss-fastdev/scripts/codex-rs-fmt-quiet.sh`
-
`/Users/gabec/.codex/skills/codex-oss-fastdev/scripts/codex-rs-test-quiet.sh
-p codex-otel` (fails in sandboxed loopback bind tests under
`otel/tests/suite/otlp_http_loopback.rs`)
- `cargo test -p codex-core guardian -- --nocapture` (original-tree run
reached Guardian tests and only hit sandbox-related listener/proxy
failures)

Co-authored-by: Codex <noreply@openai.com>
## Problem

When multiple Codex sessions are open at once, terminal tabs and windows
are hard to distinguish from each other. The existing status line only
helps once the TUI is already focused, so it does not solve the "which
tab is this?" problem.

This PR adds a first-class `/title` command so the terminal window or
tab title can carry a short, configurable summary of the current
session.

## Screenshot

<img width="849" height="320" alt="image"
src="https://github.com/user-attachments/assets/8b112927-7890-45ed-bb1e-adf2f584663d"
/>

## Mental model

`/statusline` and `/title` are separate status surfaces with different
constraints. The status line is an in-app footer that can be denser and
more detailed. The terminal title is external terminal metadata, so it
needs short, stable segments that still make multiple sessions easy to
tell apart.

The `/title` configuration is an ordered list of compact items. By
default it renders `spinner,project`, so active sessions show
lightweight progress first while idle sessions still stay easy to
disambiguate. Each configured item is omitted when its value is not
currently available rather than forcing a placeholder.

## Non-goals

This does not merge `/title` into `/statusline`, and it does not add an
arbitrary free-form title string. The feature is intentionally limited
to a small set of structured items so the title stays short and
reviewable.

This also does not attempt to restore whatever title the terminal or
shell had before Codex started. When Codex clears the title, it clears
the title Codex last wrote.

## Tradeoffs

A separate `/title` command adds some conceptual overlap with
`/statusline`, but it keeps title-specific constraints explicit instead
of forcing the status line model to cover two different surfaces.

Title refresh can happen frequently, so the implementation now shares
parsing and git-branch orchestration between the status line and title
paths, and caches the derived project-root name by cwd. That keeps the
hot path cheap without introducing background polling.

## Architecture

The TUI gets a new `/title` slash command and a dedicated picker UI for
selecting and ordering terminal-title items. The chosen ids are
persisted in `tui.terminal_title`, with `spinner` and `project` as the
default when the config is unset. `status` remains available as a
separate text item, so configurations like `spinner,status` render
compact progress like `⠋ Working`.

`ChatWidget` now refreshes both status surfaces through a shared
`refresh_status_surfaces()` path. That shared path parses configured
items once, warns on invalid ids once, synchronizes shared cached state
such as git-branch lookup, then renders the footer status line and
terminal title from the same snapshot.

Low-level OSC title writes live in `codex-rs/tui/src/terminal_title.rs`,
which owns the terminal write path and last-mile sanitization before
emitting OSC 0.

## Security

Terminal-title text is treated as untrusted display content before Codex
emits it. The write path strips control characters, removes invisible
and bidi formatting characters that can make the title visually
misleading, normalizes whitespace, and caps the emitted length.

References used while implementing this:

- [xterm control
sequences](https://invisible-island.net/xterm/ctlseqs/ctlseqs.html)
- [WezTerm escape sequences](https://wezterm.org/escape-sequences.html)
- [CWE-150: Improper Neutralization of Escape, Meta, or Control
Sequences](https://cwe.mitre.org/data/definitions/150.html)
- [CERT VU#999008 (Trojan Source)](https://kb.cert.org/vuls/id/999008)
- [Trojan Source disclosure site](https://trojansource.codes/)
- [Unicode Bidirectional Algorithm (UAX
#9)](https://www.unicode.org/reports/tr9/)
- [Unicode Security Considerations (UTR
#36)](https://www.unicode.org/reports/tr36/)

## Observability

Unknown configured title item ids are warned about once instead of
repeatedly spamming the transcript. Live preview applies immediately
while the `/title` picker is open, and cancel rolls the in-memory title
selection back to the pre-picker value.

If terminal title writes fail, the TUI emits debug logs around set and
clear attempts. The rendered status label intentionally collapses richer
internal states into compact title text such as `Starting...`, `Ready`,
`Thinking...`, `Working...`, `Waiting...`, and `Undoing...` when
`status` is configured.

## Tests

Ran:

- `just fmt`
- `cargo test -p codex-tui`

At the moment, the red Windows `rust-ci` failures are due to existing
`codex-core` `apply_patch_cli` stack-overflow tests that also reproduce
on `main`. The `/title`-specific `codex-tui` suite is green.
So we can find and filter spans by `turn.id`.

We do this for the `turn/start`, `turn/steer`, and `turn/interrupt`
APIs.
- Move core/src/terminal.rs and its tests into a standalone
terminal-detection workspace crate.
- Update direct consumers to depend on codex-terminal-detection and
import terminal APIs directly.

---------

Co-authored-by: Codex <noreply@openai.com>
updated Windows shell/unified_exec tool descriptions:

`exec_command`
```text
Runs a command in a PTY, returning output or a session ID for ongoing interaction.

Windows safety rules:
- Do not compose destructive filesystem commands across shells. Do not enumerate paths in PowerShell and then pass them to `cmd /c`, batch builtins, or another shell for deletion or moving. Use one shell end-to-end, prefer native PowerShell cmdlets such as `Remove-Item` / `Move-Item` with `-LiteralPath`, and avoid string-built shell commands for file operations.
- Before any recursive delete or move on Windows, verify the resolved absolute target paths stay within the intended workspace or explicitly named target directory. Never issue a recursive delete or move against a computed path if the final target has not been checked.
```

`shell`
```text
Runs a Powershell command (Windows) and returns its output. Arguments to `shell` will be passed to CreateProcessW(). Most commands should be prefixed with ["powershell.exe", "-Command"].

Examples of valid command strings:

- ls -a (show hidden): ["powershell.exe", "-Command", "Get-ChildItem -Force"]
- recursive find by name: ["powershell.exe", "-Command", "Get-ChildItem -Recurse -Filter *.py"]
- recursive grep: ["powershell.exe", "-Command", "Get-ChildItem -Path C:\\myrepo -Recurse | Select-String -Pattern 'TODO' -CaseSensitive"]
- ps aux | grep python: ["powershell.exe", "-Command", "Get-Process | Where-Object { $_.ProcessName -like '*python*' }"]
- setting an env var: ["powershell.exe", "-Command", "$env:FOO='bar'; echo $env:FOO"]
- running an inline Python script: ["powershell.exe", "-Command", "@'\nprint('Hello, world!')\n'@ | python -"]

Windows safety rules:
- Do not compose destructive filesystem commands across shells. Do not enumerate paths in PowerShell and then pass them to `cmd /c`, batch builtins, or another shell for deletion or moving. Use one shell end-to-end, prefer native PowerShell cmdlets such as `Remove-Item` / `Move-Item` with `-LiteralPath`, and avoid string-built shell commands for file operations.
- Before any recursive delete or move on Windows, verify the resolved absolute target paths stay within the intended workspace or explicitly named target directory. Never issue a recursive delete or move against a computed path if the final target has not been checked.
```

`shell_command`
```text
Runs a Powershell command (Windows) and returns its output.

Examples of valid command strings:

- ls -a (show hidden): "Get-ChildItem -Force"
- recursive find by name: "Get-ChildItem -Recurse -Filter *.py"
- recursive grep: "Get-ChildItem -Path C:\\myrepo -Recurse | Select-String -Pattern 'TODO' -CaseSensitive"
- ps aux | grep python: "Get-Process | Where-Object { $_.ProcessName -like '*python*' }"
- setting an env var: "$env:FOO='bar'; echo $env:FOO"
- running an inline Python script: "@'\nprint('Hello, world!')\n'@ | python -"

Windows safety rules:
- Do not compose destructive filesystem commands across shells. Do not enumerate paths in PowerShell and then pass them to `cmd /c`, batch builtins, or another shell for deletion or moving. Use one shell end-to-end, prefer native PowerShell cmdlets such as `Remove-Item` / `Move-Item` with `-LiteralPath`, and avoid string-built shell commands for file operations.
- Before any recursive delete or move on Windows, verify the resolved absolute target paths stay within the intended workspace or explicitly named target directory. Never issue a recursive delete or move against a computed path if the final target has not been checked.
```
adding full path to TUI so image is open-able in the TUI after being
generated. LImited to VSCode Terminal for now.
## Summary

Some background. We're looking to instrument GA turns end to end. Right
now a big gap is grouping mcp tool calls with their codex sessions. We
send session id and turn id headers to the responses call but not the
mcp/wham calls.

Ideally we could pass the args as headers like with responses, but given
the setup of the rmcp client, we can't send as headers without either
changing the rmcp package upstream to allow per request headers or
introducing a mutex which break concurrency. An earlier attempt made the
assumption that we had 1 client per thread, which allowed us to set
headers at the start of a turn. @pakrym mentioned that this assumption
might break in the near future.

So the solution now is to package the turn metadata/session id into the
_meta field in the post body and pull out in codex-backend.

- send turn metadata to MCP servers via `tools/call` `_meta` instead of
assuming per-thread request headers on shared clients
- preserve the existing `_codex_apps` metadata while adding
`x-codex-turn-metadata` for all MCP tool calls
- extend tests to cover both custom MCP servers and the codex apps
search flow

---------

Co-authored-by: Codex <noreply@openai.com>
…15220)

Exposes the legacy `codex/event/mcp_startup_update` event as an API v2
notification.

The legacy event has this shape:
```
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct McpStartupUpdateEvent {
    /// Server name being started.
    pub server: String,
    /// Current startup status.
    pub status: McpStartupStatus,
}

#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
#[serde(rename_all = "snake_case", tag = "state")]
#[ts(rename_all = "snake_case", tag = "state")]
pub enum McpStartupStatus {
    Starting,
    Ready,
    Failed { error: String },
    Cancelled,
}
```
saving image gen default save directory to
codex_home/imagegen/thread_id/
For each feature we have:
1. Trait exposed on environment
2. **Local Implementation** of the trait
3. Remote implementation that uses the client to proxy via network
4. Handler implementation that handles PRC requests and calls into
**Local Implementation**
Alternative approach, we use rusty_v8 for all platforms that its
predefined, but lets build from source a musl v8 version with bazel for
x86 and aarch64 only. We would need to release this on github and then
use the release.
- Move the auth implementation and token data into codex-login.
- Keep codex-core re-exporting that surface from codex-login for
existing callers.

---------

Co-authored-by: Codex <noreply@openai.com>
- [x] Auth MCPs when installing plugins.
Treat [] as no product allowed, empty as all products allowed.
- Split the feature system into a new `codex-features` crate.
- Cut `codex-core` and workspace consumers over to the new config and
warning APIs.

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
## Summary
- match the exec-process structure to filesystem PR #15232
- expose `ExecProcess` on `Environment`
- make `LocalProcess` the real implementation and `RemoteProcess` a thin
network proxy over `ExecServerClient`
- make `ProcessHandler` a thin RPC adapter delegating to `LocalProcess`
- add a shared local/remote process test

## Validation
- `just fmt`
- `CARGO_TARGET_DIR=~/.cache/cargo-target/codex cargo test -p
codex-exec-server`
- `just fix -p codex-exec-server`

---------

Co-authored-by: Codex <noreply@openai.com>
## Why
The argument-comment lint now has a packaged DotSlash artifact from
[#15198](#15198), so the normal repo
lint path should use that released payload instead of rebuilding the
lint from source every time.

That keeps `just clippy` and CI aligned with the shipped artifact while
preserving a separate source-build path for people actively hacking on
the lint crate.

The current alpha package also exposed two integration wrinkles that the
repo-side prebuilt wrapper needs to smooth over:
- the bundled Dylint library filename includes the host triple, for
example `@nightly-2025-09-18-aarch64-apple-darwin`, and Dylint derives
`RUSTUP_TOOLCHAIN` from that filename
- on Windows, Dylint's driver path also expects `RUSTUP_HOME` to be
present in the environment

Without those adjustments, the prebuilt CI jobs fail during `cargo
metadata` or driver setup. This change makes the checked-in prebuilt
wrapper normalize the packaged library name to the plain
`nightly-2025-09-18` channel before invoking `cargo-dylint`, and it
teaches both the wrapper and the packaged runner source to infer
`RUSTUP_HOME` from `rustup show home` when the environment does not
already provide it.

After the prebuilt Windows lint job started running successfully, it
also surfaced a handful of existing anonymous literal callsites in
`windows-sandbox-rs`. This PR now annotates those callsites so the new
cross-platform lint job is green on the current tree.

## What Changed
- checked in the current
`tools/argument-comment-lint/argument-comment-lint` DotSlash manifest
- kept `tools/argument-comment-lint/run.sh` as the source-build wrapper
for lint development
- added `tools/argument-comment-lint/run-prebuilt-linter.sh` as the
normal enforcement path, using the checked-in DotSlash package and
bundled `cargo-dylint`
- updated `just clippy` and `just argument-comment-lint` to use the
prebuilt wrapper
- split `.github/workflows/rust-ci.yml` so source-package checks live in
a dedicated `argument_comment_lint_package` job, while the released lint
runs in an `argument_comment_lint_prebuilt` matrix on Linux, macOS, and
Windows
- kept the pinned `nightly-2025-09-18` toolchain install in the prebuilt
CI matrix, since the prebuilt package still relies on rustup-provided
toolchain components
- updated `tools/argument-comment-lint/run-prebuilt-linter.sh` to
normalize host-qualified nightly library filenames, keep the `rustup`
shim directory ahead of direct toolchain `cargo` binaries, and export
`RUSTUP_HOME` when needed for Windows Dylint driver setup
- updated `tools/argument-comment-lint/src/bin/argument-comment-lint.rs`
so future published DotSlash artifacts apply the same nightly-filename
normalization and `RUSTUP_HOME` inference internally
- fixed the remaining Windows lint violations in
`codex-rs/windows-sandbox-rs` by adding the required `/*param*/`
comments at the reported callsites
- documented the checked-in DotSlash file, wrapper split, archive
layout, nightly prerequisite, and Windows `RUSTUP_HOME` requirement in
`tools/argument-comment-lint/README.md`
…15215)

### Preliminary /plugins TUI menu
- Adds a preliminary /plugins menu flow in both tui and tui_app_server.
- Fetches plugin list data asynchronously and shows loading/error/cached
states.
  - Limits this first pass to the curated ChatGPT marketplace.
  - Shows available plugins with installed/status metadata.
- Supports in-menu search over plugin display name, plugin id, plugin
name, and marketplace label.
- Opens a plugin detail view on selection, including summaries for
Skills, Apps, and MCP Servers, with back navigation.

### Testing
  - Launch codex-cli with plugins enabled (`--enable plugins`).
  - Run /plugins and verify:
      - loading state appears first
      - plugin list is shown
      - search filters results
- selecting a plugin opens detail view, with a list of
skills/connectors/MCP servers for the plugin
      - back action returns to the list.
- Verify disabled behavior by running /plugins without plugins enabled
(shows “Plugins are disabled” message).
- Launch with `--enable tui_app_server` (and plugins enabled) and repeat
the same /plugins flow; behavior should match.
We'll verify a bit later that all of this works correctly and re-enable
For early users who have already enabled apps, we should enable plugins
as part of the initial setup.
## Summary
- add a short guardian follow-up developer reminder before reused
reviews
- cache prior-review state on the guardian session instead of rescanning
full history on each request
- update guardian follow-up coverage and snapshot expectations

---------

Co-authored-by: Codex <noreply@openai.com>
Restore image generation items in resumed thread history
start with git clone, fallback to http.
@shahbajlive shahbajlive merged commit 89c48e1 into shahbajlive:main Mar 20, 2026
6 of 12 checks passed
shahbajlive pushed a commit that referenced this pull request Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.