Skip to content

Commit 96cb34f

Browse files
bolinfestOreoxp
authored andcommitted
core: cut codex-core compile time 63% with native async ToolHandler (openai#16630)
## Why `ToolHandler` was still paying a large compile-time tax from `#[async_trait]` on every concrete handler impl, even though the only object-safe boundary the registry actually stores is the internal `AnyToolHandler` adapter. This PR removes that macro-generated async wrapper layer from concrete `ToolHandler` impls while keeping the existing object-safe shim in `AnyToolHandler`. In practice, that gets essentially the same compile-time win as the larger type-erasure refactor in openai#16627, but with a much smaller diff and without changing the public shape of `ToolHandler<Output = T>`. That tradeoff matters here because this is a broad `codex-core` hotspot and reviewers should be able to judge the compile-time impact from hard numbers, not vibes. ## Headline result On a clean `codex-core` package rebuild (`cargo clean -p codex-core` before each command), rustc `total` dropped from **187.15s to 68.98s** versus the shared `0bd31dc382bd` baseline: **-63.1%**. The biggest hot passes dropped by roughly **71-72%**: | Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta | |---|---:|---:|---:| | `total` | 187.15s | 68.98s | **-63.1%** | | `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** | | `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%** | | `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** | Important caveat: `-Z time-passes` timings are nested, so `generate_crate_metadata` and `monomorphization_collector_graph_walk` are mostly overlapping, not additive. ## Why this PR over openai#16627 openai#16627 already proved that the `ToolHandler` stack was the right hotspot, but it got there by making `ToolHandler` object-safe and changing every handler to return `BoxFuture<Result<AnyToolResult, _>>` directly. This PR keeps the lower-churn shape: - `ToolHandler` remains generic over `type Output`. - Concrete handlers use native RPITIT futures with explicit `Send` bounds. - `AnyToolHandler` remains the only object-safe adapter and still does the boxing at the registry boundary, as before. - The implementation diff is only **33 files, +28/-77**. The measurements are at least comparable, and in this run this PR is slightly faster than openai#16627 on the pass-level total: | Metric | openai#16627 | This PR | Delta | |---|---:|---:|---:| | `total` | 79.90s | 68.98s | **-13.7%** | | `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** | | `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%** | | `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% | ## Profile data ### Crate-level timings `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` after `cargo clean -p codex-core`. Baseline data below is reused from the shared parent `0bd31dc382bd` profile because this PR and openai#16627 are both one commit on top of that same parent. | Crate | Baseline `duration` | This PR `duration` | Delta | Baseline `rmeta_time` | This PR `rmeta_time` | Delta | |---|---:|---:|---:|---:|---:|---:| | `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** | 174.474507208s | 55.873015583s | **-68.0%** | | `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a | ### Pass-level timings `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` after `cargo clean -p codex-core`. | Pass | Baseline | This PR | Delta | |---|---:|---:|---:| | `total` | 187.150662083s | 68.978770375s | **-63.1%** | | `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%** | | `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.737515042s | 22.190207417s | **-72.2%** | | `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% | | `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% | | `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% | | process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s / 85.90s / 2.92s | n/a | ### Self-profile query summary `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` after `cargo clean -p codex-core`, summarized with `measureme summarize -p 0.5`. | Query / phase | Baseline self time | This PR self time | Delta | Baseline total time | This PR total time | Baseline item count | This PR item count | Baseline cache hits | This PR cache hits | |---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s | 48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 | | `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184 | n/a | 15,298 | | `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a | 79,247 | | `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s | n/a | 256 | n/a | 0 | | `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a | 0 | | `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 | n/a | 0 | | `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a | 99,990 | n/a | 0 | | `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a | 78,762 | n/a | 235,877 | | Summary | Baseline | This PR | |---|---:|---:| | `evaluate_obligation` % of total CPU | 70.821% | 38.880% | | self-profile total CPU time | 255.042999997s | 120.661175956s | | process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s / 103.66s / 3.54s | ### Artifact sizes From the same `measureme summarize` output: | Artifact | Baseline | This PR | Delta | |---|---:|---:|---:| | `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 | | `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 | | `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448 | | `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 | | `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 | | `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 | | `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes | -6,073 | | `work_product_index` | 19,565 bytes | 19,565 bytes | 0 | ### Baseline hotspots before this change These are the top normalized obligation buckets from the shared baseline profile: | Obligation bucket | Samples | Duration | |---|---:|---:| | `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s | | `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 | 5.63s | | `trait:T as tools::registry::ToolHandler` | 876 | 5.45s | | `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s | | `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s | | `outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider` | 637 | 3.73s | | `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s | | `outlives:tasks::regular::RegularTask` | 726 | 3.57s | Top `items_of_instance` entries before this change were mostly concrete async handler/task impls: | Instance | Duration | |---|---:| | `tasks::regular::{impl#2}::run` | 3.79s | | `tools::handlers::mcp::{impl#0}::handle` | 3.27s | | `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action` | 3.09s | | `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s | | `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s | | `tasks::review::{impl#4}::run` | 2.82s | | `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s | | `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` | 2.73s | | `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s | | `tasks::compact::{impl#4}::run` | 2.45s | ## What changed Relevant pre-change registry shape: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/0bd31dc382bd1c33dc2bb6b97069c76aa10ba14b/codex-rs/core/src/tools/registry.rs#L38-L219) Current registry shape in this PR: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry.rs#L38-L203) - `ToolHandler::{is_mutating, handle}` now return native `impl Future + Send` futures instead of using `#[async_trait]`. - `AnyToolHandler` remains the object-safe adapter and boxes those futures at the registry boundary with explicit lifetimes. - Concrete handlers and the registry test handler drop `#[async_trait]` but otherwise keep their async method bodies intact. - Representative examples: [`codex-rs/core/src/tools/handlers/shell.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/shell.rs#L223-L379), [`codex-rs/core/src/tools/handlers/unified_exec.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/unified_exec.rs), [`codex-rs/core/src/tools/registry_tests.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry_tests.rs) ## Tradeoff This is intentionally less invasive than openai#16627: it does **not** move result boxing into every concrete handler and does **not** change `ToolHandler` into an object-safe trait. Instead, it keeps the existing registry-level type-erasure boundary and only removes the macro-generated async wrapper layer from concrete impls. So the runtime boxing story stays basically the same as before, while the compile-time savings are still large. ## Verification Existing verification for this branch still applies: - Ran `cargo test -p codex-core`; this change compiled and the suite reached the known unrelated `config::tests::*guardian*` failures, with no local diff under `codex-rs/core/src/config/`. Profiling commands used for the tables above: - `cargo clean -p codex-core` - `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` - `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` - `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` - `measureme summarize -p 0.5`
1 parent f8ab150 commit 96cb34f

33 files changed

+28
-77
lines changed

codex-rs/core/src/tools/code_mode/execute_handler.rs

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
use async_trait::async_trait;
2-
31
use crate::function_tool::FunctionCallError;
42
use crate::tools::context::FunctionToolOutput;
53
use crate::tools::context::ToolInvocation;
@@ -53,7 +51,6 @@ impl CodeModeExecuteHandler {
5351
}
5452
}
5553

56-
#[async_trait]
5754
impl ToolHandler for CodeModeExecuteHandler {
5855
type Output = FunctionToolOutput;
5956

codex-rs/core/src/tools/code_mode/wait_handler.rs

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
use async_trait::async_trait;
21
use serde::Deserialize;
32

43
use crate::function_tool::FunctionCallError;
@@ -39,7 +38,6 @@ where
3938
})
4039
}
4140

42-
#[async_trait]
4341
impl ToolHandler for CodeModeWaitHandler {
4442
type Output = FunctionToolOutput;
4543

codex-rs/core/src/tools/handlers/agent_jobs.rs

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ use crate::tools::handlers::multi_agents::build_agent_spawn_config;
1313
use crate::tools::handlers::parse_arguments;
1414
use crate::tools::registry::ToolHandler;
1515
use crate::tools::registry::ToolKind;
16-
use async_trait::async_trait;
1716
use codex_protocol::ThreadId;
1817
use codex_protocol::protocol::AgentStatus;
1918
use codex_protocol::protocol::SessionSource;
@@ -178,7 +177,6 @@ impl JobProgressEmitter {
178177
}
179178
}
180179

181-
#[async_trait]
182180
impl ToolHandler for BatchJobHandler {
183181
type Output = FunctionToolOutput;
184182

codex-rs/core/src/tools/handlers/apply_patch.rs

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@ use crate::tools::registry::ToolKind;
2121
use crate::tools::runtimes::apply_patch::ApplyPatchRequest;
2222
use crate::tools::runtimes::apply_patch::ApplyPatchRuntime;
2323
use crate::tools::sandboxing::ToolCtx;
24-
use async_trait::async_trait;
2524
use codex_apply_patch::ApplyPatchAction;
2625
use codex_apply_patch::ApplyPatchFileChange;
2726
use codex_protocol::models::FileSystemPermissions;
@@ -122,7 +121,6 @@ async fn effective_patch_permissions(
122121
)
123122
}
124123

125-
#[async_trait]
126124
impl ToolHandler for ApplyPatchHandler {
127125
type Output = ApplyPatchToolOutput;
128126

codex-rs/core/src/tools/handlers/dynamic.rs

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ use crate::tools::context::ToolPayload;
77
use crate::tools::handlers::parse_arguments;
88
use crate::tools::registry::ToolHandler;
99
use crate::tools::registry::ToolKind;
10-
use async_trait::async_trait;
1110
use codex_protocol::dynamic_tools::DynamicToolCallRequest;
1211
use codex_protocol::dynamic_tools::DynamicToolResponse;
1312
use codex_protocol::models::FunctionCallOutputContentItem;
@@ -20,7 +19,6 @@ use tracing::warn;
2019

2120
pub struct DynamicToolHandler;
2221

23-
#[async_trait]
2422
impl ToolHandler for DynamicToolHandler {
2523
type Output = FunctionToolOutput;
2624

codex-rs/core/src/tools/handlers/js_repl.rs

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
use async_trait::async_trait;
21
use serde_json::Value as JsonValue;
32
use std::sync::Arc;
43
use std::time::Duration;
@@ -92,7 +91,6 @@ async fn emit_js_repl_exec_end(
9291
};
9392
emitter.emit(ctx, stage).await;
9493
}
95-
#[async_trait]
9694
impl ToolHandler for JsReplHandler {
9795
type Output = FunctionToolOutput;
9896

@@ -182,7 +180,6 @@ impl ToolHandler for JsReplHandler {
182180
}
183181
}
184182

185-
#[async_trait]
186183
impl ToolHandler for JsReplResetHandler {
187184
type Output = FunctionToolOutput;
188185

codex-rs/core/src/tools/handlers/list_dir.rs

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@ use std::fs::FileType;
44
use std::path::Path;
55
use std::path::PathBuf;
66

7-
use async_trait::async_trait;
87
use codex_utils_string::take_bytes_at_char_boundary;
98
use serde::Deserialize;
109
use tokio::fs;
@@ -45,7 +44,6 @@ struct ListDirArgs {
4544
depth: usize,
4645
}
4746

48-
#[async_trait]
4947
impl ToolHandler for ListDirHandler {
5048
type Output = FunctionToolOutput;
5149

codex-rs/core/src/tools/handlers/mcp.rs

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
use async_trait::async_trait;
21
use std::sync::Arc;
32

43
use crate::function_tool::FunctionCallError;
@@ -10,7 +9,6 @@ use crate::tools::registry::ToolKind;
109
use codex_protocol::mcp::CallToolResult;
1110

1211
pub struct McpHandler;
13-
#[async_trait]
1412
impl ToolHandler for McpHandler {
1513
type Output = CallToolResult;
1614

codex-rs/core/src/tools/handlers/mcp_resource.rs

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ use std::sync::Arc;
33
use std::time::Duration;
44
use std::time::Instant;
55

6-
use async_trait::async_trait;
76
use codex_protocol::mcp::CallToolResult;
87
use codex_protocol::models::function_call_output_content_items_to_text;
98
use rmcp::model::ListResourceTemplatesResult;
@@ -178,7 +177,6 @@ struct ReadResourcePayload {
178177
result: ReadResourceResult,
179178
}
180179

181-
#[async_trait]
182180
impl ToolHandler for McpResourceHandler {
183181
type Output = FunctionToolOutput;
184182

codex-rs/core/src/tools/handlers/multi_agents.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ pub(crate) use crate::tools::handlers::multi_agents_common::*;
1717
use crate::tools::handlers::parse_arguments;
1818
use crate::tools::registry::ToolHandler;
1919
use crate::tools::registry::ToolKind;
20-
use async_trait::async_trait;
2120
use codex_protocol::ThreadId;
2221
use codex_protocol::models::ResponseInputItem;
2322
use codex_protocol::openai_models::ReasoningEffort;

0 commit comments

Comments
 (0)