Skip to content

Commit c5071a9

Browse files
committed
Add exec-server process RPC implementation
Restore the process RPC client/server implementation on top of the initialize-only base.\n\nCo-authored-by: Codex <noreply@openai.com>
1 parent 16ff474 commit c5071a9

18 files changed

Lines changed: 4736 additions & 418 deletions

codex-rs/Cargo.lock

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

codex-rs/exec-server/Cargo.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,24 @@ workspace = true
1313

1414
[dependencies]
1515
base64 = { workspace = true }
16+
clap = { workspace = true, features = ["derive"] }
1617
codex-app-server-protocol = { workspace = true }
1718
codex-utils-pty = { workspace = true }
19+
futures = { workspace = true }
1820
serde = { workspace = true, features = ["derive"] }
1921
serde_json = { workspace = true }
2022
thiserror = { workspace = true }
2123
tokio = { workspace = true, features = [
2224
"io-std",
2325
"io-util",
2426
"macros",
27+
"net",
2528
"process",
2629
"rt-multi-thread",
2730
"sync",
2831
"time",
2932
] }
33+
tokio-tungstenite = { workspace = true }
3034
tracing = { workspace = true }
3135

3236
[dev-dependencies]

codex-rs/exec-server/DESIGN.md

Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
# exec-server design notes
2+
3+
This document sketches a likely direction for integrating `codex-exec-server`
4+
with unified exec without baking the full tool-call policy stack into the
5+
server.
6+
7+
The goal is:
8+
9+
- keep exec-server generic and reusable
10+
- keep approval, sandbox, and retry policy in `core`
11+
- preserve the unified-exec event flow the model already depends on
12+
- support retained output caps so polling and snapshot-style APIs do not grow
13+
memory without bound
14+
15+
## Unified exec today
16+
17+
Today the flow for LLM-visible interactive execution is:
18+
19+
1. The model sees the `exec_command` and `write_stdin` tools.
20+
2. `UnifiedExecHandler` parses the tool arguments and allocates a process id.
21+
3. `UnifiedExecProcessManager::exec_command(...)` calls
22+
`open_session_with_sandbox(...)`.
23+
4. `ToolOrchestrator` drives approval, sandbox selection, managed network
24+
approval, and sandbox-denial retry behavior.
25+
5. `UnifiedExecRuntime` builds a `CommandSpec`, asks the current
26+
`SandboxAttempt` to transform it into an `ExecRequest`, and passes that
27+
resolved request back to the process manager.
28+
6. `open_session_with_exec_env(...)` spawns the process from that resolved
29+
`ExecRequest`.
30+
7. Unified exec emits an `ExecCommandBegin` event.
31+
8. Unified exec starts a background output watcher that emits
32+
`ExecCommandOutputDelta` events.
33+
9. The initial tool call collects output until the requested yield deadline and
34+
returns an `ExecCommandToolOutput` snapshot to the model.
35+
10. If the process is still running, unified exec stores it and later emits
36+
`ExecCommandEnd` when the exit watcher fires.
37+
11. A later `write_stdin` tool call writes to the stored process, emits a
38+
`TerminalInteraction` event, collects another bounded snapshot, and returns
39+
that tool response to the model.
40+
41+
Important observation: the 250ms / 10s yield-window behavior is not really a
42+
process-server concern. It is a client-side convenience layer for the LLM tool
43+
API. The server should focus on raw process lifecycle and streaming events.
44+
45+
## Proposed boundary
46+
47+
The clean split is:
48+
49+
- exec-server server: process lifecycle, output streaming, retained output caps
50+
- exec-server client: `wait`, `communicate`, yield-window helpers, session
51+
bookkeeping
52+
- unified exec in `core`: tool parsing, event emission, approvals, sandboxing,
53+
managed networking, retry semantics
54+
55+
If exec-server is used by unified exec later, the boundary should sit between
56+
step 5 and step 6 above: after policy has produced a resolved spawn request, but
57+
before the actual PTY or pipe spawn.
58+
59+
## Suggested process API
60+
61+
Start simple and explicit:
62+
63+
- `process/start`
64+
- `process/write`
65+
- `process/closeStdin`
66+
- `process/resize`
67+
- `process/terminate`
68+
- `process/wait`
69+
- `process/snapshot`
70+
71+
Server notifications:
72+
73+
- `process/output`
74+
- `process/exited`
75+
- optionally `process/started`
76+
- optionally `process/failed`
77+
78+
Suggested request shapes:
79+
80+
```rust
81+
enum ProcessStartRequest {
82+
Direct(DirectExecSpec),
83+
Prepared(PreparedExecSpec),
84+
}
85+
86+
struct DirectExecSpec {
87+
process_id: String,
88+
argv: Vec<String>,
89+
cwd: PathBuf,
90+
env: HashMap<String, String>,
91+
arg0: Option<String>,
92+
io: ProcessIo,
93+
}
94+
95+
struct PreparedExecSpec {
96+
process_id: String,
97+
request: PreparedExecRequest,
98+
io: ProcessIo,
99+
}
100+
101+
enum ProcessIo {
102+
Pty { rows: u16, cols: u16 },
103+
Pipe { stdin: StdinMode },
104+
}
105+
106+
enum StdinMode {
107+
Open,
108+
Closed,
109+
}
110+
111+
enum TerminateMode {
112+
Graceful { timeout_ms: u64 },
113+
Force,
114+
}
115+
```
116+
117+
Notes:
118+
119+
- `processId` remains a protocol handle, not an OS pid.
120+
- `wait` is a good generic API because many callers want process completion
121+
without manually wiring notifications.
122+
- `communicate` is also a reasonable API, but it should probably start as a
123+
client helper built on top of `write + closeStdin + wait + snapshot`.
124+
- If an RPC form of `communicate` is added later, it should be a convenience
125+
wrapper rather than the primitive execution model.
126+
127+
## Output capping
128+
129+
Even with event streaming, the server should retain a bounded amount of output
130+
per process so callers can poll, wait, or reconnect without unbounded memory
131+
growth.
132+
133+
Suggested behavior:
134+
135+
- stream every output chunk live via `process/output`
136+
- retain capped output per process in memory
137+
- keep stdout and stderr separately for pipe-backed processes
138+
- for PTY-backed processes, treat retained output as a single terminal stream
139+
- expose truncation metadata on snapshots
140+
141+
Suggested snapshot response:
142+
143+
```rust
144+
struct ProcessSnapshot {
145+
stdout: Vec<u8>,
146+
stderr: Vec<u8>,
147+
terminal: Vec<u8>,
148+
truncated: bool,
149+
exit_code: Option<i32>,
150+
running: bool,
151+
}
152+
```
153+
154+
Implementation-wise, the current `HeadTailBuffer` pattern used by unified exec
155+
is a good fit. The cap should be server config, not request config, so memory
156+
use stays predictable.
157+
158+
## Sandboxing and networking
159+
160+
### How unified exec does it today
161+
162+
Unified exec does not hand raw command args directly to the PTY layer for tool
163+
calls. Instead, it:
164+
165+
1. computes approval requirements
166+
2. chooses a sandbox attempt
167+
3. applies managed-network policy if needed
168+
4. transforms `CommandSpec` into `ExecRequest`
169+
5. spawns from that resolved `ExecRequest`
170+
171+
That split is already valuable and should be preserved.
172+
173+
### Recommended exec-server design
174+
175+
Do not put approval policy into exec-server.
176+
177+
Instead, support two execution modes:
178+
179+
- `Direct`: raw command, intended for orchestrator-side or already-trusted use
180+
- `Prepared`: already-resolved spawn request, intended for tool-call execution
181+
182+
For tool calls from the LLM side:
183+
184+
1. `core` runs the existing approval + sandbox + managed-network flow
185+
2. `core` produces a resolved `ExecRequest`
186+
3. the exec-server client sends `PreparedExecSpec`
187+
4. exec-server spawns exactly that request and streams process events
188+
189+
For orchestrator-side execution:
190+
191+
1. caller sends `DirectExecSpec`
192+
2. exec-server spawns directly without running approval or sandbox policy
193+
194+
This gives one generic process API while keeping the policy-sensitive logic in
195+
the place that already owns it.
196+
197+
### Why not make exec-server own sandbox selection?
198+
199+
That would force exec-server to understand:
200+
201+
- approval policy
202+
- exec policy / prefix rules
203+
- managed-network approval flow
204+
- sandbox retry semantics
205+
- guardian routing
206+
- feature-flag-driven sandbox selection
207+
- platform-specific sandbox helper configuration
208+
209+
That is too opinionated for a reusable process service.
210+
211+
## Optional future server config
212+
213+
If exec-server grows beyond the current prototype, a config object like this
214+
would be enough:
215+
216+
```rust
217+
struct ExecServerConfig {
218+
shutdown_grace_period_ms: u64,
219+
max_processes_per_connection: usize,
220+
retained_output_bytes_per_process: usize,
221+
allow_direct_exec: bool,
222+
allow_prepared_exec: bool,
223+
}
224+
```
225+
226+
That keeps policy surface small:
227+
228+
- lifecycle limits live in the server
229+
- trust and sandbox policy stay with the caller
230+
231+
## Mapping back to LLM-visible events
232+
233+
If unified exec is later backed by exec-server, the `core` client wrapper should
234+
keep owning the translation into the existing event model:
235+
236+
- `process/start` success -> `ExecCommandBegin`
237+
- `process/output` -> `ExecCommandOutputDelta`
238+
- local `process/write` call -> `TerminalInteraction`
239+
- `process/exited` plus retained transcript -> `ExecCommandEnd`
240+
241+
That preserves the current LLM-facing contract while making the process backend
242+
swappable.

0 commit comments

Comments
 (0)