|
| 1 | +# exec-server design notes |
| 2 | + |
| 3 | +This document sketches a likely direction for integrating `codex-exec-server` |
| 4 | +with unified exec without baking the full tool-call policy stack into the |
| 5 | +server. |
| 6 | + |
| 7 | +The goal is: |
| 8 | + |
| 9 | +- keep exec-server generic and reusable |
| 10 | +- keep approval, sandbox, and retry policy in `core` |
| 11 | +- preserve the unified-exec event flow the model already depends on |
| 12 | +- support retained output caps so polling and snapshot-style APIs do not grow |
| 13 | + memory without bound |
| 14 | + |
| 15 | +## Unified exec today |
| 16 | + |
| 17 | +Today the flow for LLM-visible interactive execution is: |
| 18 | + |
| 19 | +1. The model sees the `exec_command` and `write_stdin` tools. |
| 20 | +2. `UnifiedExecHandler` parses the tool arguments and allocates a process id. |
| 21 | +3. `UnifiedExecProcessManager::exec_command(...)` calls |
| 22 | + `open_session_with_sandbox(...)`. |
| 23 | +4. `ToolOrchestrator` drives approval, sandbox selection, managed network |
| 24 | + approval, and sandbox-denial retry behavior. |
| 25 | +5. `UnifiedExecRuntime` builds a `CommandSpec`, asks the current |
| 26 | + `SandboxAttempt` to transform it into an `ExecRequest`, and passes that |
| 27 | + resolved request back to the process manager. |
| 28 | +6. `open_session_with_exec_env(...)` spawns the process from that resolved |
| 29 | + `ExecRequest`. |
| 30 | +7. Unified exec emits an `ExecCommandBegin` event. |
| 31 | +8. Unified exec starts a background output watcher that emits |
| 32 | + `ExecCommandOutputDelta` events. |
| 33 | +9. The initial tool call collects output until the requested yield deadline and |
| 34 | + returns an `ExecCommandToolOutput` snapshot to the model. |
| 35 | +10. If the process is still running, unified exec stores it and later emits |
| 36 | + `ExecCommandEnd` when the exit watcher fires. |
| 37 | +11. A later `write_stdin` tool call writes to the stored process, emits a |
| 38 | + `TerminalInteraction` event, collects another bounded snapshot, and returns |
| 39 | + that tool response to the model. |
| 40 | + |
| 41 | +Important observation: the 250ms / 10s yield-window behavior is not really a |
| 42 | +process-server concern. It is a client-side convenience layer for the LLM tool |
| 43 | +API. The server should focus on raw process lifecycle and streaming events. |
| 44 | + |
| 45 | +## Proposed boundary |
| 46 | + |
| 47 | +The clean split is: |
| 48 | + |
| 49 | +- exec-server server: process lifecycle, output streaming, retained output caps |
| 50 | +- exec-server client: `wait`, `communicate`, yield-window helpers, session |
| 51 | + bookkeeping |
| 52 | +- unified exec in `core`: tool parsing, event emission, approvals, sandboxing, |
| 53 | + managed networking, retry semantics |
| 54 | + |
| 55 | +If exec-server is used by unified exec later, the boundary should sit between |
| 56 | +step 5 and step 6 above: after policy has produced a resolved spawn request, but |
| 57 | +before the actual PTY or pipe spawn. |
| 58 | + |
| 59 | +## Suggested process API |
| 60 | + |
| 61 | +Start simple and explicit: |
| 62 | + |
| 63 | +- `process/start` |
| 64 | +- `process/write` |
| 65 | +- `process/closeStdin` |
| 66 | +- `process/resize` |
| 67 | +- `process/terminate` |
| 68 | +- `process/wait` |
| 69 | +- `process/snapshot` |
| 70 | + |
| 71 | +Server notifications: |
| 72 | + |
| 73 | +- `process/output` |
| 74 | +- `process/exited` |
| 75 | +- optionally `process/started` |
| 76 | +- optionally `process/failed` |
| 77 | + |
| 78 | +Suggested request shapes: |
| 79 | + |
| 80 | +```rust |
| 81 | +enum ProcessStartRequest { |
| 82 | + Direct(DirectExecSpec), |
| 83 | + Prepared(PreparedExecSpec), |
| 84 | +} |
| 85 | + |
| 86 | +struct DirectExecSpec { |
| 87 | + process_id: String, |
| 88 | + argv: Vec<String>, |
| 89 | + cwd: PathBuf, |
| 90 | + env: HashMap<String, String>, |
| 91 | + arg0: Option<String>, |
| 92 | + io: ProcessIo, |
| 93 | +} |
| 94 | + |
| 95 | +struct PreparedExecSpec { |
| 96 | + process_id: String, |
| 97 | + request: PreparedExecRequest, |
| 98 | + io: ProcessIo, |
| 99 | +} |
| 100 | + |
| 101 | +enum ProcessIo { |
| 102 | + Pty { rows: u16, cols: u16 }, |
| 103 | + Pipe { stdin: StdinMode }, |
| 104 | +} |
| 105 | + |
| 106 | +enum StdinMode { |
| 107 | + Open, |
| 108 | + Closed, |
| 109 | +} |
| 110 | + |
| 111 | +enum TerminateMode { |
| 112 | + Graceful { timeout_ms: u64 }, |
| 113 | + Force, |
| 114 | +} |
| 115 | +``` |
| 116 | + |
| 117 | +Notes: |
| 118 | + |
| 119 | +- `processId` remains a protocol handle, not an OS pid. |
| 120 | +- `wait` is a good generic API because many callers want process completion |
| 121 | + without manually wiring notifications. |
| 122 | +- `communicate` is also a reasonable API, but it should probably start as a |
| 123 | + client helper built on top of `write + closeStdin + wait + snapshot`. |
| 124 | +- If an RPC form of `communicate` is added later, it should be a convenience |
| 125 | + wrapper rather than the primitive execution model. |
| 126 | + |
| 127 | +## Output capping |
| 128 | + |
| 129 | +Even with event streaming, the server should retain a bounded amount of output |
| 130 | +per process so callers can poll, wait, or reconnect without unbounded memory |
| 131 | +growth. |
| 132 | + |
| 133 | +Suggested behavior: |
| 134 | + |
| 135 | +- stream every output chunk live via `process/output` |
| 136 | +- retain capped output per process in memory |
| 137 | +- keep stdout and stderr separately for pipe-backed processes |
| 138 | +- for PTY-backed processes, treat retained output as a single terminal stream |
| 139 | +- expose truncation metadata on snapshots |
| 140 | + |
| 141 | +Suggested snapshot response: |
| 142 | + |
| 143 | +```rust |
| 144 | +struct ProcessSnapshot { |
| 145 | + stdout: Vec<u8>, |
| 146 | + stderr: Vec<u8>, |
| 147 | + terminal: Vec<u8>, |
| 148 | + truncated: bool, |
| 149 | + exit_code: Option<i32>, |
| 150 | + running: bool, |
| 151 | +} |
| 152 | +``` |
| 153 | + |
| 154 | +Implementation-wise, the current `HeadTailBuffer` pattern used by unified exec |
| 155 | +is a good fit. The cap should be server config, not request config, so memory |
| 156 | +use stays predictable. |
| 157 | + |
| 158 | +## Sandboxing and networking |
| 159 | + |
| 160 | +### How unified exec does it today |
| 161 | + |
| 162 | +Unified exec does not hand raw command args directly to the PTY layer for tool |
| 163 | +calls. Instead, it: |
| 164 | + |
| 165 | +1. computes approval requirements |
| 166 | +2. chooses a sandbox attempt |
| 167 | +3. applies managed-network policy if needed |
| 168 | +4. transforms `CommandSpec` into `ExecRequest` |
| 169 | +5. spawns from that resolved `ExecRequest` |
| 170 | + |
| 171 | +That split is already valuable and should be preserved. |
| 172 | + |
| 173 | +### Recommended exec-server design |
| 174 | + |
| 175 | +Do not put approval policy into exec-server. |
| 176 | + |
| 177 | +Instead, support two execution modes: |
| 178 | + |
| 179 | +- `Direct`: raw command, intended for orchestrator-side or already-trusted use |
| 180 | +- `Prepared`: already-resolved spawn request, intended for tool-call execution |
| 181 | + |
| 182 | +For tool calls from the LLM side: |
| 183 | + |
| 184 | +1. `core` runs the existing approval + sandbox + managed-network flow |
| 185 | +2. `core` produces a resolved `ExecRequest` |
| 186 | +3. the exec-server client sends `PreparedExecSpec` |
| 187 | +4. exec-server spawns exactly that request and streams process events |
| 188 | + |
| 189 | +For orchestrator-side execution: |
| 190 | + |
| 191 | +1. caller sends `DirectExecSpec` |
| 192 | +2. exec-server spawns directly without running approval or sandbox policy |
| 193 | + |
| 194 | +This gives one generic process API while keeping the policy-sensitive logic in |
| 195 | +the place that already owns it. |
| 196 | + |
| 197 | +### Why not make exec-server own sandbox selection? |
| 198 | + |
| 199 | +That would force exec-server to understand: |
| 200 | + |
| 201 | +- approval policy |
| 202 | +- exec policy / prefix rules |
| 203 | +- managed-network approval flow |
| 204 | +- sandbox retry semantics |
| 205 | +- guardian routing |
| 206 | +- feature-flag-driven sandbox selection |
| 207 | +- platform-specific sandbox helper configuration |
| 208 | + |
| 209 | +That is too opinionated for a reusable process service. |
| 210 | + |
| 211 | +## Optional future server config |
| 212 | + |
| 213 | +If exec-server grows beyond the current prototype, a config object like this |
| 214 | +would be enough: |
| 215 | + |
| 216 | +```rust |
| 217 | +struct ExecServerConfig { |
| 218 | + shutdown_grace_period_ms: u64, |
| 219 | + max_processes_per_connection: usize, |
| 220 | + retained_output_bytes_per_process: usize, |
| 221 | + allow_direct_exec: bool, |
| 222 | + allow_prepared_exec: bool, |
| 223 | +} |
| 224 | +``` |
| 225 | + |
| 226 | +That keeps policy surface small: |
| 227 | + |
| 228 | +- lifecycle limits live in the server |
| 229 | +- trust and sandbox policy stay with the caller |
| 230 | + |
| 231 | +## Mapping back to LLM-visible events |
| 232 | + |
| 233 | +If unified exec is later backed by exec-server, the `core` client wrapper should |
| 234 | +keep owning the translation into the existing event model: |
| 235 | + |
| 236 | +- `process/start` success -> `ExecCommandBegin` |
| 237 | +- `process/output` -> `ExecCommandOutputDelta` |
| 238 | +- local `process/write` call -> `TerminalInteraction` |
| 239 | +- `process/exited` plus retained transcript -> `ExecCommandEnd` |
| 240 | + |
| 241 | +That preserves the current LLM-facing contract while making the process backend |
| 242 | +swappable. |
0 commit comments