Conversation
bolinfest
added a commit
that referenced
this pull request
Apr 2, 2026
…16630) ## Why `ToolHandler` was still paying a large compile-time tax from `#[async_trait]` on every concrete handler impl, even though the only object-safe boundary the registry actually stores is the internal `AnyToolHandler` adapter. This PR removes that macro-generated async wrapper layer from concrete `ToolHandler` impls while keeping the existing object-safe shim in `AnyToolHandler`. In practice, that gets essentially the same compile-time win as the larger type-erasure refactor in #16627, but with a much smaller diff and without changing the public shape of `ToolHandler<Output = T>`. That tradeoff matters here because this is a broad `codex-core` hotspot and reviewers should be able to judge the compile-time impact from hard numbers, not vibes. ## Headline result On a clean `codex-core` package rebuild (`cargo clean -p codex-core` before each command), rustc `total` dropped from **187.15s to 68.98s** versus the shared `0bd31dc382bd` baseline: **-63.1%**. The biggest hot passes dropped by roughly **71-72%**: | Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta | |---|---:|---:|---:| | `total` | 187.15s | 68.98s | **-63.1%** | | `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** | | `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%** | | `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** | Important caveat: `-Z time-passes` timings are nested, so `generate_crate_metadata` and `monomorphization_collector_graph_walk` are mostly overlapping, not additive. ## Why this PR over #16627 #16627 already proved that the `ToolHandler` stack was the right hotspot, but it got there by making `ToolHandler` object-safe and changing every handler to return `BoxFuture<Result<AnyToolResult, _>>` directly. This PR keeps the lower-churn shape: - `ToolHandler` remains generic over `type Output`. - Concrete handlers use native RPITIT futures with explicit `Send` bounds. - `AnyToolHandler` remains the only object-safe adapter and still does the boxing at the registry boundary, as before. - The implementation diff is only **33 files, +28/-77**. The measurements are at least comparable, and in this run this PR is slightly faster than #16627 on the pass-level total: | Metric | #16627 | This PR | Delta | |---|---:|---:|---:| | `total` | 79.90s | 68.98s | **-13.7%** | | `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** | | `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%** | | `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% | ## Profile data ### Crate-level timings `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` after `cargo clean -p codex-core`. Baseline data below is reused from the shared parent `0bd31dc382bd` profile because this PR and #16627 are both one commit on top of that same parent. | Crate | Baseline `duration` | This PR `duration` | Delta | Baseline `rmeta_time` | This PR `rmeta_time` | Delta | |---|---:|---:|---:|---:|---:|---:| | `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** | 174.474507208s | 55.873015583s | **-68.0%** | | `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a | ### Pass-level timings `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` after `cargo clean -p codex-core`. | Pass | Baseline | This PR | Delta | |---|---:|---:|---:| | `total` | 187.150662083s | 68.978770375s | **-63.1%** | | `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%** | | `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.737515042s | 22.190207417s | **-72.2%** | | `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% | | `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% | | `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% | | process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s / 85.90s / 2.92s | n/a | ### Self-profile query summary `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` after `cargo clean -p codex-core`, summarized with `measureme summarize -p 0.5`. | Query / phase | Baseline self time | This PR self time | Delta | Baseline total time | This PR total time | Baseline item count | This PR item count | Baseline cache hits | This PR cache hits | |---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s | 48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 | | `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184 | n/a | 15,298 | | `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a | 79,247 | | `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s | n/a | 256 | n/a | 0 | | `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a | 0 | | `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 | n/a | 0 | | `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a | 99,990 | n/a | 0 | | `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a | 78,762 | n/a | 235,877 | | Summary | Baseline | This PR | |---|---:|---:| | `evaluate_obligation` % of total CPU | 70.821% | 38.880% | | self-profile total CPU time | 255.042999997s | 120.661175956s | | process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s / 103.66s / 3.54s | ### Artifact sizes From the same `measureme summarize` output: | Artifact | Baseline | This PR | Delta | |---|---:|---:|---:| | `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 | | `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 | | `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448 | | `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 | | `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 | | `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 | | `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes | -6,073 | | `work_product_index` | 19,565 bytes | 19,565 bytes | 0 | ### Baseline hotspots before this change These are the top normalized obligation buckets from the shared baseline profile: | Obligation bucket | Samples | Duration | |---|---:|---:| | `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s | | `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 | 5.63s | | `trait:T as tools::registry::ToolHandler` | 876 | 5.45s | | `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s | | `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s | | `outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider` | 637 | 3.73s | | `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s | | `outlives:tasks::regular::RegularTask` | 726 | 3.57s | Top `items_of_instance` entries before this change were mostly concrete async handler/task impls: | Instance | Duration | |---|---:| | `tasks::regular::{impl#2}::run` | 3.79s | | `tools::handlers::mcp::{impl#0}::handle` | 3.27s | | `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action` | 3.09s | | `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s | | `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s | | `tasks::review::{impl#4}::run` | 2.82s | | `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s | | `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` | 2.73s | | `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s | | `tasks::compact::{impl#4}::run` | 2.45s | ## What changed Relevant pre-change registry shape: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/0bd31dc382bd1c33dc2bb6b97069c76aa10ba14b/codex-rs/core/src/tools/registry.rs#L38-L219) Current registry shape in this PR: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry.rs#L38-L203) - `ToolHandler::{is_mutating, handle}` now return native `impl Future + Send` futures instead of using `#[async_trait]`. - `AnyToolHandler` remains the object-safe adapter and boxes those futures at the registry boundary with explicit lifetimes. - Concrete handlers and the registry test handler drop `#[async_trait]` but otherwise keep their async method bodies intact. - Representative examples: [`codex-rs/core/src/tools/handlers/shell.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/shell.rs#L223-L379), [`codex-rs/core/src/tools/handlers/unified_exec.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/unified_exec.rs), [`codex-rs/core/src/tools/registry_tests.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry_tests.rs) ## Tradeoff This is intentionally less invasive than #16627: it does **not** move result boxing into every concrete handler and does **not** change `ToolHandler` into an object-safe trait. Instead, it keeps the existing registry-level type-erasure boundary and only removes the macro-generated async wrapper layer from concrete impls. So the runtime boxing story stays basically the same as before, while the compile-time savings are still large. ## Verification Existing verification for this branch still applies: - Ran `cargo test -p codex-core`; this change compiled and the suite reached the known unrelated `config::tests::*guardian*` failures, with no local diff under `codex-rs/core/src/config/`. Profiling commands used for the tables above: - `cargo clean -p codex-core` - `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` - `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` - `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` - `measureme summarize -p 0.5`
Collaborator
Author
|
Closing in favor of #16630. |
Oreoxp
pushed a commit
to Oreoxp/codex-cli
that referenced
this pull request
Apr 6, 2026
…penai#16630) ## Why `ToolHandler` was still paying a large compile-time tax from `#[async_trait]` on every concrete handler impl, even though the only object-safe boundary the registry actually stores is the internal `AnyToolHandler` adapter. This PR removes that macro-generated async wrapper layer from concrete `ToolHandler` impls while keeping the existing object-safe shim in `AnyToolHandler`. In practice, that gets essentially the same compile-time win as the larger type-erasure refactor in openai#16627, but with a much smaller diff and without changing the public shape of `ToolHandler<Output = T>`. That tradeoff matters here because this is a broad `codex-core` hotspot and reviewers should be able to judge the compile-time impact from hard numbers, not vibes. ## Headline result On a clean `codex-core` package rebuild (`cargo clean -p codex-core` before each command), rustc `total` dropped from **187.15s to 68.98s** versus the shared `0bd31dc382bd` baseline: **-63.1%**. The biggest hot passes dropped by roughly **71-72%**: | Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta | |---|---:|---:|---:| | `total` | 187.15s | 68.98s | **-63.1%** | | `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** | | `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%** | | `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** | Important caveat: `-Z time-passes` timings are nested, so `generate_crate_metadata` and `monomorphization_collector_graph_walk` are mostly overlapping, not additive. ## Why this PR over openai#16627 openai#16627 already proved that the `ToolHandler` stack was the right hotspot, but it got there by making `ToolHandler` object-safe and changing every handler to return `BoxFuture<Result<AnyToolResult, _>>` directly. This PR keeps the lower-churn shape: - `ToolHandler` remains generic over `type Output`. - Concrete handlers use native RPITIT futures with explicit `Send` bounds. - `AnyToolHandler` remains the only object-safe adapter and still does the boxing at the registry boundary, as before. - The implementation diff is only **33 files, +28/-77**. The measurements are at least comparable, and in this run this PR is slightly faster than openai#16627 on the pass-level total: | Metric | openai#16627 | This PR | Delta | |---|---:|---:|---:| | `total` | 79.90s | 68.98s | **-13.7%** | | `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** | | `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%** | | `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% | ## Profile data ### Crate-level timings `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` after `cargo clean -p codex-core`. Baseline data below is reused from the shared parent `0bd31dc382bd` profile because this PR and openai#16627 are both one commit on top of that same parent. | Crate | Baseline `duration` | This PR `duration` | Delta | Baseline `rmeta_time` | This PR `rmeta_time` | Delta | |---|---:|---:|---:|---:|---:|---:| | `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** | 174.474507208s | 55.873015583s | **-68.0%** | | `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a | ### Pass-level timings `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` after `cargo clean -p codex-core`. | Pass | Baseline | This PR | Delta | |---|---:|---:|---:| | `total` | 187.150662083s | 68.978770375s | **-63.1%** | | `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%** | | `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.737515042s | 22.190207417s | **-72.2%** | | `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% | | `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% | | `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% | | process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s / 85.90s / 2.92s | n/a | ### Self-profile query summary `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` after `cargo clean -p codex-core`, summarized with `measureme summarize -p 0.5`. | Query / phase | Baseline self time | This PR self time | Delta | Baseline total time | This PR total time | Baseline item count | This PR item count | Baseline cache hits | This PR cache hits | |---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s | 48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 | | `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184 | n/a | 15,298 | | `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a | 79,247 | | `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s | n/a | 256 | n/a | 0 | | `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a | 0 | | `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 | n/a | 0 | | `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a | 99,990 | n/a | 0 | | `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a | 78,762 | n/a | 235,877 | | Summary | Baseline | This PR | |---|---:|---:| | `evaluate_obligation` % of total CPU | 70.821% | 38.880% | | self-profile total CPU time | 255.042999997s | 120.661175956s | | process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s / 103.66s / 3.54s | ### Artifact sizes From the same `measureme summarize` output: | Artifact | Baseline | This PR | Delta | |---|---:|---:|---:| | `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 | | `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 | | `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448 | | `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 | | `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 | | `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 | | `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes | -6,073 | | `work_product_index` | 19,565 bytes | 19,565 bytes | 0 | ### Baseline hotspots before this change These are the top normalized obligation buckets from the shared baseline profile: | Obligation bucket | Samples | Duration | |---|---:|---:| | `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s | | `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 | 5.63s | | `trait:T as tools::registry::ToolHandler` | 876 | 5.45s | | `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s | | `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s | | `outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider` | 637 | 3.73s | | `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s | | `outlives:tasks::regular::RegularTask` | 726 | 3.57s | Top `items_of_instance` entries before this change were mostly concrete async handler/task impls: | Instance | Duration | |---|---:| | `tasks::regular::{impl#2}::run` | 3.79s | | `tools::handlers::mcp::{impl#0}::handle` | 3.27s | | `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action` | 3.09s | | `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s | | `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s | | `tasks::review::{impl#4}::run` | 2.82s | | `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s | | `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` | 2.73s | | `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s | | `tasks::compact::{impl#4}::run` | 2.45s | ## What changed Relevant pre-change registry shape: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/0bd31dc382bd1c33dc2bb6b97069c76aa10ba14b/codex-rs/core/src/tools/registry.rs#L38-L219) Current registry shape in this PR: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry.rs#L38-L203) - `ToolHandler::{is_mutating, handle}` now return native `impl Future + Send` futures instead of using `#[async_trait]`. - `AnyToolHandler` remains the object-safe adapter and boxes those futures at the registry boundary with explicit lifetimes. - Concrete handlers and the registry test handler drop `#[async_trait]` but otherwise keep their async method bodies intact. - Representative examples: [`codex-rs/core/src/tools/handlers/shell.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/shell.rs#L223-L379), [`codex-rs/core/src/tools/handlers/unified_exec.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/unified_exec.rs), [`codex-rs/core/src/tools/registry_tests.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry_tests.rs) ## Tradeoff This is intentionally less invasive than openai#16627: it does **not** move result boxing into every concrete handler and does **not** change `ToolHandler` into an object-safe trait. Instead, it keeps the existing registry-level type-erasure boundary and only removes the macro-generated async wrapper layer from concrete impls. So the runtime boxing story stays basically the same as before, while the compile-time savings are still large. ## Verification Existing verification for this branch still applies: - Ran `cargo test -p codex-core`; this change compiled and the suite reached the known unrelated `config::tests::*guardian*` failures, with no local diff under `codex-rs/core/src/config/`. Profiling commands used for the tables above: - `cargo clean -p codex-core` - `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` - `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` - `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` - `measureme summarize -p 0.5`
rebroad
pushed a commit
to rebroad/codex
that referenced
this pull request
Apr 9, 2026
…penai#16630) `ToolHandler` was still paying a large compile-time tax from `#[async_trait]` on every concrete handler impl, even though the only object-safe boundary the registry actually stores is the internal `AnyToolHandler` adapter. This PR removes that macro-generated async wrapper layer from concrete `ToolHandler` impls while keeping the existing object-safe shim in `AnyToolHandler`. In practice, that gets essentially the same compile-time win as the larger type-erasure refactor in openai#16627, but with a much smaller diff and without changing the public shape of `ToolHandler<Output = T>`. That tradeoff matters here because this is a broad `codex-core` hotspot and reviewers should be able to judge the compile-time impact from hard numbers, not vibes. On a clean `codex-core` package rebuild (`cargo clean -p codex-core` before each command), rustc `total` dropped from **187.15s to 68.98s** versus the shared `0bd31dc382bd` baseline: **-63.1%**. The biggest hot passes dropped by roughly **71-72%**: | Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta | |---|---:|---:|---:| | `total` | 187.15s | 68.98s | **-63.1%** | | `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** | | `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%** | | `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** | Important caveat: `-Z time-passes` timings are nested, so `generate_crate_metadata` and `monomorphization_collector_graph_walk` are mostly overlapping, not additive. hotspot, but it got there by making `ToolHandler` object-safe and changing every handler to return `BoxFuture<Result<AnyToolResult, _>>` directly. This PR keeps the lower-churn shape: - `ToolHandler` remains generic over `type Output`. - Concrete handlers use native RPITIT futures with explicit `Send` bounds. - `AnyToolHandler` remains the only object-safe adapter and still does the boxing at the registry boundary, as before. - The implementation diff is only **33 files, +28/-77**. The measurements are at least comparable, and in this run this PR is slightly faster than openai#16627 on the pass-level total: | Metric | openai#16627 | This PR | Delta | |---|---:|---:|---:| | `total` | 79.90s | 68.98s | **-13.7%** | | `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** | | `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%** | | `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% | `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` after `cargo clean -p codex-core`. Baseline data below is reused from the shared parent `0bd31dc382bd` profile because this PR and openai#16627 are both one commit on top of that same parent. | Crate | Baseline `duration` | This PR `duration` | Delta | Baseline `rmeta_time` | This PR `rmeta_time` | Delta | |---|---:|---:|---:|---:|---:|---:| | `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** | 174.474507208s | 55.873015583s | **-68.0%** | | `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` after `cargo clean -p codex-core`. | Pass | Baseline | This PR | Delta | |---|---:|---:|---:| | `total` | 187.150662083s | 68.978770375s | **-63.1%** | | `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%** | | `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.737515042s | 22.190207417s | **-72.2%** | | `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% | | `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% | | `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% | | process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s / 85.90s / 2.92s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` after `cargo clean -p codex-core`, summarized with `measureme summarize -p 0.5`. | Query / phase | Baseline self time | This PR self time | Delta | Baseline total time | This PR total time | Baseline item count | This PR item count | Baseline cache hits | This PR cache hits | |---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s | 48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 | | `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184 | n/a | 15,298 | | `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a | 79,247 | | `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s | n/a | 256 | n/a | 0 | | `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a | 0 | | `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 | n/a | 0 | | `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a | 99,990 | n/a | 0 | | `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a | 78,762 | n/a | 235,877 | | Summary | Baseline | This PR | |---|---:|---:| | `evaluate_obligation` % of total CPU | 70.821% | 38.880% | | self-profile total CPU time | 255.042999997s | 120.661175956s | | process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s / 103.66s / 3.54s | From the same `measureme summarize` output: | Artifact | Baseline | This PR | Delta | |---|---:|---:|---:| | `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 | | `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 | | `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448 | | `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 | | `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 | | `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 | | `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes | -6,073 | | `work_product_index` | 19,565 bytes | 19,565 bytes | 0 | These are the top normalized obligation buckets from the shared baseline profile: | Obligation bucket | Samples | Duration | |---|---:|---:| | `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s | | `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 | 5.63s | | `trait:T as tools::registry::ToolHandler` | 876 | 5.45s | | `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s | | `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s | | `outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider` | 637 | 3.73s | | `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s | | `outlives:tasks::regular::RegularTask` | 726 | 3.57s | Top `items_of_instance` entries before this change were mostly concrete async handler/task impls: | Instance | Duration | |---|---:| | `tasks::regular::{impl#2}::run` | 3.79s | | `tools::handlers::mcp::{impl#0}::handle` | 3.27s | | `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action` | 3.09s | | `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s | | `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s | | `tasks::review::{impl#4}::run` | 2.82s | | `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s | | `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` | 2.73s | | `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s | | `tasks::compact::{impl#4}::run` | 2.45s | Relevant pre-change registry shape: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/0bd31dc382bd1c33dc2bb6b97069c76aa10ba14b/codex-rs/core/src/tools/registry.rs#L38-L219) Current registry shape in this PR: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry.rs#L38-L203) - `ToolHandler::{is_mutating, handle}` now return native `impl Future + Send` futures instead of using `#[async_trait]`. - `AnyToolHandler` remains the object-safe adapter and boxes those futures at the registry boundary with explicit lifetimes. - Concrete handlers and the registry test handler drop `#[async_trait]` but otherwise keep their async method bodies intact. - Representative examples: [`codex-rs/core/src/tools/handlers/shell.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/shell.rs#L223-L379), [`codex-rs/core/src/tools/handlers/unified_exec.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/unified_exec.rs), [`codex-rs/core/src/tools/registry_tests.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry_tests.rs) This is intentionally less invasive than openai#16627: it does **not** move result boxing into every concrete handler and does **not** change `ToolHandler` into an object-safe trait. Instead, it keeps the existing registry-level type-erasure boundary and only removes the macro-generated async wrapper layer from concrete impls. So the runtime boxing story stays basically the same as before, while the compile-time savings are still large. Existing verification for this branch still applies: - Ran `cargo test -p codex-core`; this change compiled and the suite reached the known unrelated `config::tests::*guardian*` failures, with no local diff under `codex-rs/core/src/config/`. Profiling commands used for the tables above: - `cargo clean -p codex-core` - `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` - `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` - `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` - `measureme summarize -p 0.5`
rebroad
pushed a commit
to rebroad/codex
that referenced
this pull request
Apr 9, 2026
…penai#16630) `ToolHandler` was still paying a large compile-time tax from `#[async_trait]` on every concrete handler impl, even though the only object-safe boundary the registry actually stores is the internal `AnyToolHandler` adapter. This PR removes that macro-generated async wrapper layer from concrete `ToolHandler` impls while keeping the existing object-safe shim in `AnyToolHandler`. In practice, that gets essentially the same compile-time win as the larger type-erasure refactor in openai#16627, but with a much smaller diff and without changing the public shape of `ToolHandler<Output = T>`. That tradeoff matters here because this is a broad `codex-core` hotspot and reviewers should be able to judge the compile-time impact from hard numbers, not vibes. On a clean `codex-core` package rebuild (`cargo clean -p codex-core` before each command), rustc `total` dropped from **187.15s to 68.98s** versus the shared `0bd31dc382bd` baseline: **-63.1%**. The biggest hot passes dropped by roughly **71-72%**: | Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta | |---|---:|---:|---:| | `total` | 187.15s | 68.98s | **-63.1%** | | `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** | | `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%** | | `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** | Important caveat: `-Z time-passes` timings are nested, so `generate_crate_metadata` and `monomorphization_collector_graph_walk` are mostly overlapping, not additive. hotspot, but it got there by making `ToolHandler` object-safe and changing every handler to return `BoxFuture<Result<AnyToolResult, _>>` directly. This PR keeps the lower-churn shape: - `ToolHandler` remains generic over `type Output`. - Concrete handlers use native RPITIT futures with explicit `Send` bounds. - `AnyToolHandler` remains the only object-safe adapter and still does the boxing at the registry boundary, as before. - The implementation diff is only **33 files, +28/-77**. The measurements are at least comparable, and in this run this PR is slightly faster than openai#16627 on the pass-level total: | Metric | openai#16627 | This PR | Delta | |---|---:|---:|---:| | `total` | 79.90s | 68.98s | **-13.7%** | | `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** | | `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%** | | `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% | `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` after `cargo clean -p codex-core`. Baseline data below is reused from the shared parent `0bd31dc382bd` profile because this PR and openai#16627 are both one commit on top of that same parent. | Crate | Baseline `duration` | This PR `duration` | Delta | Baseline `rmeta_time` | This PR `rmeta_time` | Delta | |---|---:|---:|---:|---:|---:|---:| | `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** | 174.474507208s | 55.873015583s | **-68.0%** | | `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` after `cargo clean -p codex-core`. | Pass | Baseline | This PR | Delta | |---|---:|---:|---:| | `total` | 187.150662083s | 68.978770375s | **-63.1%** | | `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%** | | `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.737515042s | 22.190207417s | **-72.2%** | | `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% | | `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% | | `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% | | process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s / 85.90s / 2.92s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` after `cargo clean -p codex-core`, summarized with `measureme summarize -p 0.5`. | Query / phase | Baseline self time | This PR self time | Delta | Baseline total time | This PR total time | Baseline item count | This PR item count | Baseline cache hits | This PR cache hits | |---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s | 48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 | | `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184 | n/a | 15,298 | | `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a | 79,247 | | `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s | n/a | 256 | n/a | 0 | | `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a | 0 | | `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 | n/a | 0 | | `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a | 99,990 | n/a | 0 | | `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a | 78,762 | n/a | 235,877 | | Summary | Baseline | This PR | |---|---:|---:| | `evaluate_obligation` % of total CPU | 70.821% | 38.880% | | self-profile total CPU time | 255.042999997s | 120.661175956s | | process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s / 103.66s / 3.54s | From the same `measureme summarize` output: | Artifact | Baseline | This PR | Delta | |---|---:|---:|---:| | `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 | | `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 | | `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448 | | `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 | | `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 | | `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 | | `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes | -6,073 | | `work_product_index` | 19,565 bytes | 19,565 bytes | 0 | These are the top normalized obligation buckets from the shared baseline profile: | Obligation bucket | Samples | Duration | |---|---:|---:| | `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s | | `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 | 5.63s | | `trait:T as tools::registry::ToolHandler` | 876 | 5.45s | | `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s | | `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s | | `outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider` | 637 | 3.73s | | `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s | | `outlives:tasks::regular::RegularTask` | 726 | 3.57s | Top `items_of_instance` entries before this change were mostly concrete async handler/task impls: | Instance | Duration | |---|---:| | `tasks::regular::{impl#2}::run` | 3.79s | | `tools::handlers::mcp::{impl#0}::handle` | 3.27s | | `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action` | 3.09s | | `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s | | `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s | | `tasks::review::{impl#4}::run` | 2.82s | | `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s | | `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` | 2.73s | | `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s | | `tasks::compact::{impl#4}::run` | 2.45s | Relevant pre-change registry shape: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/0bd31dc382bd1c33dc2bb6b97069c76aa10ba14b/codex-rs/core/src/tools/registry.rs#L38-L219) Current registry shape in this PR: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry.rs#L38-L203) - `ToolHandler::{is_mutating, handle}` now return native `impl Future + Send` futures instead of using `#[async_trait]`. - `AnyToolHandler` remains the object-safe adapter and boxes those futures at the registry boundary with explicit lifetimes. - Concrete handlers and the registry test handler drop `#[async_trait]` but otherwise keep their async method bodies intact. - Representative examples: [`codex-rs/core/src/tools/handlers/shell.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/shell.rs#L223-L379), [`codex-rs/core/src/tools/handlers/unified_exec.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/unified_exec.rs), [`codex-rs/core/src/tools/registry_tests.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry_tests.rs) This is intentionally less invasive than openai#16627: it does **not** move result boxing into every concrete handler and does **not** change `ToolHandler` into an object-safe trait. Instead, it keeps the existing registry-level type-erasure boundary and only removes the macro-generated async wrapper layer from concrete impls. So the runtime boxing story stays basically the same as before, while the compile-time savings are still large. Existing verification for this branch still applies: - Ran `cargo test -p codex-core`; this change compiled and the suite reached the known unrelated `config::tests::*guardian*` failures, with no local diff under `codex-rs/core/src/config/`. Profiling commands used for the tables above: - `cargo clean -p codex-core` - `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` - `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` - `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` - `measureme summarize -p 0.5`
rebroad
pushed a commit
to rebroad/codex
that referenced
this pull request
Apr 9, 2026
…penai#16630) `ToolHandler` was still paying a large compile-time tax from `#[async_trait]` on every concrete handler impl, even though the only object-safe boundary the registry actually stores is the internal `AnyToolHandler` adapter. This PR removes that macro-generated async wrapper layer from concrete `ToolHandler` impls while keeping the existing object-safe shim in `AnyToolHandler`. In practice, that gets essentially the same compile-time win as the larger type-erasure refactor in openai#16627, but with a much smaller diff and without changing the public shape of `ToolHandler<Output = T>`. That tradeoff matters here because this is a broad `codex-core` hotspot and reviewers should be able to judge the compile-time impact from hard numbers, not vibes. On a clean `codex-core` package rebuild (`cargo clean -p codex-core` before each command), rustc `total` dropped from **187.15s to 68.98s** versus the shared `0bd31dc382bd` baseline: **-63.1%**. The biggest hot passes dropped by roughly **71-72%**: | Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta | |---|---:|---:|---:| | `total` | 187.15s | 68.98s | **-63.1%** | | `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** | | `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%** | | `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** | Important caveat: `-Z time-passes` timings are nested, so `generate_crate_metadata` and `monomorphization_collector_graph_walk` are mostly overlapping, not additive. hotspot, but it got there by making `ToolHandler` object-safe and changing every handler to return `BoxFuture<Result<AnyToolResult, _>>` directly. This PR keeps the lower-churn shape: - `ToolHandler` remains generic over `type Output`. - Concrete handlers use native RPITIT futures with explicit `Send` bounds. - `AnyToolHandler` remains the only object-safe adapter and still does the boxing at the registry boundary, as before. - The implementation diff is only **33 files, +28/-77**. The measurements are at least comparable, and in this run this PR is slightly faster than openai#16627 on the pass-level total: | Metric | openai#16627 | This PR | Delta | |---|---:|---:|---:| | `total` | 79.90s | 68.98s | **-13.7%** | | `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** | | `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%** | | `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% | `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` after `cargo clean -p codex-core`. Baseline data below is reused from the shared parent `0bd31dc382bd` profile because this PR and openai#16627 are both one commit on top of that same parent. | Crate | Baseline `duration` | This PR `duration` | Delta | Baseline `rmeta_time` | This PR `rmeta_time` | Delta | |---|---:|---:|---:|---:|---:|---:| | `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** | 174.474507208s | 55.873015583s | **-68.0%** | | `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` after `cargo clean -p codex-core`. | Pass | Baseline | This PR | Delta | |---|---:|---:|---:| | `total` | 187.150662083s | 68.978770375s | **-63.1%** | | `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%** | | `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.737515042s | 22.190207417s | **-72.2%** | | `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% | | `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% | | `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% | | process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s / 85.90s / 2.92s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` after `cargo clean -p codex-core`, summarized with `measureme summarize -p 0.5`. | Query / phase | Baseline self time | This PR self time | Delta | Baseline total time | This PR total time | Baseline item count | This PR item count | Baseline cache hits | This PR cache hits | |---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s | 48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 | | `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184 | n/a | 15,298 | | `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a | 79,247 | | `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s | n/a | 256 | n/a | 0 | | `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a | 0 | | `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 | n/a | 0 | | `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a | 99,990 | n/a | 0 | | `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a | 78,762 | n/a | 235,877 | | Summary | Baseline | This PR | |---|---:|---:| | `evaluate_obligation` % of total CPU | 70.821% | 38.880% | | self-profile total CPU time | 255.042999997s | 120.661175956s | | process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s / 103.66s / 3.54s | From the same `measureme summarize` output: | Artifact | Baseline | This PR | Delta | |---|---:|---:|---:| | `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 | | `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 | | `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448 | | `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 | | `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 | | `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 | | `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes | -6,073 | | `work_product_index` | 19,565 bytes | 19,565 bytes | 0 | These are the top normalized obligation buckets from the shared baseline profile: | Obligation bucket | Samples | Duration | |---|---:|---:| | `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s | | `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 | 5.63s | | `trait:T as tools::registry::ToolHandler` | 876 | 5.45s | | `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s | | `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s | | `outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider` | 637 | 3.73s | | `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s | | `outlives:tasks::regular::RegularTask` | 726 | 3.57s | Top `items_of_instance` entries before this change were mostly concrete async handler/task impls: | Instance | Duration | |---|---:| | `tasks::regular::{impl#2}::run` | 3.79s | | `tools::handlers::mcp::{impl#0}::handle` | 3.27s | | `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action` | 3.09s | | `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s | | `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s | | `tasks::review::{impl#4}::run` | 2.82s | | `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s | | `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` | 2.73s | | `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s | | `tasks::compact::{impl#4}::run` | 2.45s | Relevant pre-change registry shape: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/0bd31dc382bd1c33dc2bb6b97069c76aa10ba14b/codex-rs/core/src/tools/registry.rs#L38-L219) Current registry shape in this PR: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry.rs#L38-L203) - `ToolHandler::{is_mutating, handle}` now return native `impl Future + Send` futures instead of using `#[async_trait]`. - `AnyToolHandler` remains the object-safe adapter and boxes those futures at the registry boundary with explicit lifetimes. - Concrete handlers and the registry test handler drop `#[async_trait]` but otherwise keep their async method bodies intact. - Representative examples: [`codex-rs/core/src/tools/handlers/shell.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/shell.rs#L223-L379), [`codex-rs/core/src/tools/handlers/unified_exec.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/unified_exec.rs), [`codex-rs/core/src/tools/registry_tests.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry_tests.rs) This is intentionally less invasive than openai#16627: it does **not** move result boxing into every concrete handler and does **not** change `ToolHandler` into an object-safe trait. Instead, it keeps the existing registry-level type-erasure boundary and only removes the macro-generated async wrapper layer from concrete impls. So the runtime boxing story stays basically the same as before, while the compile-time savings are still large. Existing verification for this branch still applies: - Ran `cargo test -p codex-core`; this change compiled and the suite reached the known unrelated `config::tests::*guardian*` failures, with no local diff under `codex-rs/core/src/config/`. Profiling commands used for the tables above: - `cargo clean -p codex-core` - `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` - `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` - `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` - `measureme summarize -p 0.5`
rebroad
pushed a commit
to rebroad/codex
that referenced
this pull request
Apr 9, 2026
…penai#16630) `ToolHandler` was still paying a large compile-time tax from `#[async_trait]` on every concrete handler impl, even though the only object-safe boundary the registry actually stores is the internal `AnyToolHandler` adapter. This PR removes that macro-generated async wrapper layer from concrete `ToolHandler` impls while keeping the existing object-safe shim in `AnyToolHandler`. In practice, that gets essentially the same compile-time win as the larger type-erasure refactor in openai#16627, but with a much smaller diff and without changing the public shape of `ToolHandler<Output = T>`. That tradeoff matters here because this is a broad `codex-core` hotspot and reviewers should be able to judge the compile-time impact from hard numbers, not vibes. On a clean `codex-core` package rebuild (`cargo clean -p codex-core` before each command), rustc `total` dropped from **187.15s to 68.98s** versus the shared `0bd31dc382bd` baseline: **-63.1%**. The biggest hot passes dropped by roughly **71-72%**: | Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta | |---|---:|---:|---:| | `total` | 187.15s | 68.98s | **-63.1%** | | `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** | | `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%** | | `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** | Important caveat: `-Z time-passes` timings are nested, so `generate_crate_metadata` and `monomorphization_collector_graph_walk` are mostly overlapping, not additive. hotspot, but it got there by making `ToolHandler` object-safe and changing every handler to return `BoxFuture<Result<AnyToolResult, _>>` directly. This PR keeps the lower-churn shape: - `ToolHandler` remains generic over `type Output`. - Concrete handlers use native RPITIT futures with explicit `Send` bounds. - `AnyToolHandler` remains the only object-safe adapter and still does the boxing at the registry boundary, as before. - The implementation diff is only **33 files, +28/-77**. The measurements are at least comparable, and in this run this PR is slightly faster than openai#16627 on the pass-level total: | Metric | openai#16627 | This PR | Delta | |---|---:|---:|---:| | `total` | 79.90s | 68.98s | **-13.7%** | | `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** | | `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%** | | `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% | `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` after `cargo clean -p codex-core`. Baseline data below is reused from the shared parent `0bd31dc382bd` profile because this PR and openai#16627 are both one commit on top of that same parent. | Crate | Baseline `duration` | This PR `duration` | Delta | Baseline `rmeta_time` | This PR `rmeta_time` | Delta | |---|---:|---:|---:|---:|---:|---:| | `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** | 174.474507208s | 55.873015583s | **-68.0%** | | `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` after `cargo clean -p codex-core`. | Pass | Baseline | This PR | Delta | |---|---:|---:|---:| | `total` | 187.150662083s | 68.978770375s | **-63.1%** | | `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%** | | `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.737515042s | 22.190207417s | **-72.2%** | | `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% | | `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% | | `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% | | process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s / 85.90s / 2.92s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` after `cargo clean -p codex-core`, summarized with `measureme summarize -p 0.5`. | Query / phase | Baseline self time | This PR self time | Delta | Baseline total time | This PR total time | Baseline item count | This PR item count | Baseline cache hits | This PR cache hits | |---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s | 48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 | | `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184 | n/a | 15,298 | | `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a | 79,247 | | `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s | n/a | 256 | n/a | 0 | | `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a | 0 | | `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 | n/a | 0 | | `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a | 99,990 | n/a | 0 | | `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a | 78,762 | n/a | 235,877 | | Summary | Baseline | This PR | |---|---:|---:| | `evaluate_obligation` % of total CPU | 70.821% | 38.880% | | self-profile total CPU time | 255.042999997s | 120.661175956s | | process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s / 103.66s / 3.54s | From the same `measureme summarize` output: | Artifact | Baseline | This PR | Delta | |---|---:|---:|---:| | `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 | | `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 | | `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448 | | `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 | | `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 | | `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 | | `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes | -6,073 | | `work_product_index` | 19,565 bytes | 19,565 bytes | 0 | These are the top normalized obligation buckets from the shared baseline profile: | Obligation bucket | Samples | Duration | |---|---:|---:| | `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s | | `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 | 5.63s | | `trait:T as tools::registry::ToolHandler` | 876 | 5.45s | | `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s | | `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s | | `outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider` | 637 | 3.73s | | `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s | | `outlives:tasks::regular::RegularTask` | 726 | 3.57s | Top `items_of_instance` entries before this change were mostly concrete async handler/task impls: | Instance | Duration | |---|---:| | `tasks::regular::{impl#2}::run` | 3.79s | | `tools::handlers::mcp::{impl#0}::handle` | 3.27s | | `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action` | 3.09s | | `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s | | `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s | | `tasks::review::{impl#4}::run` | 2.82s | | `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s | | `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` | 2.73s | | `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s | | `tasks::compact::{impl#4}::run` | 2.45s | Relevant pre-change registry shape: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/0bd31dc382bd1c33dc2bb6b97069c76aa10ba14b/codex-rs/core/src/tools/registry.rs#L38-L219) Current registry shape in this PR: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry.rs#L38-L203) - `ToolHandler::{is_mutating, handle}` now return native `impl Future + Send` futures instead of using `#[async_trait]`. - `AnyToolHandler` remains the object-safe adapter and boxes those futures at the registry boundary with explicit lifetimes. - Concrete handlers and the registry test handler drop `#[async_trait]` but otherwise keep their async method bodies intact. - Representative examples: [`codex-rs/core/src/tools/handlers/shell.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/shell.rs#L223-L379), [`codex-rs/core/src/tools/handlers/unified_exec.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/unified_exec.rs), [`codex-rs/core/src/tools/registry_tests.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry_tests.rs) This is intentionally less invasive than openai#16627: it does **not** move result boxing into every concrete handler and does **not** change `ToolHandler` into an object-safe trait. Instead, it keeps the existing registry-level type-erasure boundary and only removes the macro-generated async wrapper layer from concrete impls. So the runtime boxing story stays basically the same as before, while the compile-time savings are still large. Existing verification for this branch still applies: - Ran `cargo test -p codex-core`; this change compiled and the suite reached the known unrelated `config::tests::*guardian*` failures, with no local diff under `codex-rs/core/src/config/`. Profiling commands used for the tables above: - `cargo clean -p codex-core` - `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` - `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` - `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` - `measureme summarize -p 0.5`
rebroad
pushed a commit
to rebroad/codex
that referenced
this pull request
Apr 10, 2026
…penai#16630) `ToolHandler` was still paying a large compile-time tax from `#[async_trait]` on every concrete handler impl, even though the only object-safe boundary the registry actually stores is the internal `AnyToolHandler` adapter. This PR removes that macro-generated async wrapper layer from concrete `ToolHandler` impls while keeping the existing object-safe shim in `AnyToolHandler`. In practice, that gets essentially the same compile-time win as the larger type-erasure refactor in openai#16627, but with a much smaller diff and without changing the public shape of `ToolHandler<Output = T>`. That tradeoff matters here because this is a broad `codex-core` hotspot and reviewers should be able to judge the compile-time impact from hard numbers, not vibes. On a clean `codex-core` package rebuild (`cargo clean -p codex-core` before each command), rustc `total` dropped from **187.15s to 68.98s** versus the shared `0bd31dc382bd` baseline: **-63.1%**. The biggest hot passes dropped by roughly **71-72%**: | Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta | |---|---:|---:|---:| | `total` | 187.15s | 68.98s | **-63.1%** | | `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** | | `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%** | | `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** | Important caveat: `-Z time-passes` timings are nested, so `generate_crate_metadata` and `monomorphization_collector_graph_walk` are mostly overlapping, not additive. hotspot, but it got there by making `ToolHandler` object-safe and changing every handler to return `BoxFuture<Result<AnyToolResult, _>>` directly. This PR keeps the lower-churn shape: - `ToolHandler` remains generic over `type Output`. - Concrete handlers use native RPITIT futures with explicit `Send` bounds. - `AnyToolHandler` remains the only object-safe adapter and still does the boxing at the registry boundary, as before. - The implementation diff is only **33 files, +28/-77**. The measurements are at least comparable, and in this run this PR is slightly faster than openai#16627 on the pass-level total: | Metric | openai#16627 | This PR | Delta | |---|---:|---:|---:| | `total` | 79.90s | 68.98s | **-13.7%** | | `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** | | `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%** | | `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% | `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` after `cargo clean -p codex-core`. Baseline data below is reused from the shared parent `0bd31dc382bd` profile because this PR and openai#16627 are both one commit on top of that same parent. | Crate | Baseline `duration` | This PR `duration` | Delta | Baseline `rmeta_time` | This PR `rmeta_time` | Delta | |---|---:|---:|---:|---:|---:|---:| | `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** | 174.474507208s | 55.873015583s | **-68.0%** | | `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` after `cargo clean -p codex-core`. | Pass | Baseline | This PR | Delta | |---|---:|---:|---:| | `total` | 187.150662083s | 68.978770375s | **-63.1%** | | `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%** | | `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.737515042s | 22.190207417s | **-72.2%** | | `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% | | `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% | | `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% | | process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s / 85.90s / 2.92s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` after `cargo clean -p codex-core`, summarized with `measureme summarize -p 0.5`. | Query / phase | Baseline self time | This PR self time | Delta | Baseline total time | This PR total time | Baseline item count | This PR item count | Baseline cache hits | This PR cache hits | |---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s | 48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 | | `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184 | n/a | 15,298 | | `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a | 79,247 | | `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s | n/a | 256 | n/a | 0 | | `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a | 0 | | `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 | n/a | 0 | | `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a | 99,990 | n/a | 0 | | `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a | 78,762 | n/a | 235,877 | | Summary | Baseline | This PR | |---|---:|---:| | `evaluate_obligation` % of total CPU | 70.821% | 38.880% | | self-profile total CPU time | 255.042999997s | 120.661175956s | | process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s / 103.66s / 3.54s | From the same `measureme summarize` output: | Artifact | Baseline | This PR | Delta | |---|---:|---:|---:| | `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 | | `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 | | `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448 | | `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 | | `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 | | `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 | | `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes | -6,073 | | `work_product_index` | 19,565 bytes | 19,565 bytes | 0 | These are the top normalized obligation buckets from the shared baseline profile: | Obligation bucket | Samples | Duration | |---|---:|---:| | `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s | | `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 | 5.63s | | `trait:T as tools::registry::ToolHandler` | 876 | 5.45s | | `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s | | `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s | | `outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider` | 637 | 3.73s | | `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s | | `outlives:tasks::regular::RegularTask` | 726 | 3.57s | Top `items_of_instance` entries before this change were mostly concrete async handler/task impls: | Instance | Duration | |---|---:| | `tasks::regular::{impl#2}::run` | 3.79s | | `tools::handlers::mcp::{impl#0}::handle` | 3.27s | | `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action` | 3.09s | | `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s | | `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s | | `tasks::review::{impl#4}::run` | 2.82s | | `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s | | `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` | 2.73s | | `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s | | `tasks::compact::{impl#4}::run` | 2.45s | Relevant pre-change registry shape: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/0bd31dc382bd1c33dc2bb6b97069c76aa10ba14b/codex-rs/core/src/tools/registry.rs#L38-L219) Current registry shape in this PR: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry.rs#L38-L203) - `ToolHandler::{is_mutating, handle}` now return native `impl Future + Send` futures instead of using `#[async_trait]`. - `AnyToolHandler` remains the object-safe adapter and boxes those futures at the registry boundary with explicit lifetimes. - Concrete handlers and the registry test handler drop `#[async_trait]` but otherwise keep their async method bodies intact. - Representative examples: [`codex-rs/core/src/tools/handlers/shell.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/shell.rs#L223-L379), [`codex-rs/core/src/tools/handlers/unified_exec.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/unified_exec.rs), [`codex-rs/core/src/tools/registry_tests.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry_tests.rs) This is intentionally less invasive than openai#16627: it does **not** move result boxing into every concrete handler and does **not** change `ToolHandler` into an object-safe trait. Instead, it keeps the existing registry-level type-erasure boundary and only removes the macro-generated async wrapper layer from concrete impls. So the runtime boxing story stays basically the same as before, while the compile-time savings are still large. Existing verification for this branch still applies: - Ran `cargo test -p codex-core`; this change compiled and the suite reached the known unrelated `config::tests::*guardian*` failures, with no local diff under `codex-rs/core/src/config/`. Profiling commands used for the tables above: - `cargo clean -p codex-core` - `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` - `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` - `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` - `measureme summarize -p 0.5`
rebroad
pushed a commit
to rebroad/codex
that referenced
this pull request
Apr 10, 2026
…penai#16630) `ToolHandler` was still paying a large compile-time tax from `#[async_trait]` on every concrete handler impl, even though the only object-safe boundary the registry actually stores is the internal `AnyToolHandler` adapter. This PR removes that macro-generated async wrapper layer from concrete `ToolHandler` impls while keeping the existing object-safe shim in `AnyToolHandler`. In practice, that gets essentially the same compile-time win as the larger type-erasure refactor in openai#16627, but with a much smaller diff and without changing the public shape of `ToolHandler<Output = T>`. That tradeoff matters here because this is a broad `codex-core` hotspot and reviewers should be able to judge the compile-time impact from hard numbers, not vibes. On a clean `codex-core` package rebuild (`cargo clean -p codex-core` before each command), rustc `total` dropped from **187.15s to 68.98s** versus the shared `0bd31dc382bd` baseline: **-63.1%**. The biggest hot passes dropped by roughly **71-72%**: | Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta | |---|---:|---:|---:| | `total` | 187.15s | 68.98s | **-63.1%** | | `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** | | `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%** | | `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** | Important caveat: `-Z time-passes` timings are nested, so `generate_crate_metadata` and `monomorphization_collector_graph_walk` are mostly overlapping, not additive. hotspot, but it got there by making `ToolHandler` object-safe and changing every handler to return `BoxFuture<Result<AnyToolResult, _>>` directly. This PR keeps the lower-churn shape: - `ToolHandler` remains generic over `type Output`. - Concrete handlers use native RPITIT futures with explicit `Send` bounds. - `AnyToolHandler` remains the only object-safe adapter and still does the boxing at the registry boundary, as before. - The implementation diff is only **33 files, +28/-77**. The measurements are at least comparable, and in this run this PR is slightly faster than openai#16627 on the pass-level total: | Metric | openai#16627 | This PR | Delta | |---|---:|---:|---:| | `total` | 79.90s | 68.98s | **-13.7%** | | `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** | | `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%** | | `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% | `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` after `cargo clean -p codex-core`. Baseline data below is reused from the shared parent `0bd31dc382bd` profile because this PR and openai#16627 are both one commit on top of that same parent. | Crate | Baseline `duration` | This PR `duration` | Delta | Baseline `rmeta_time` | This PR `rmeta_time` | Delta | |---|---:|---:|---:|---:|---:|---:| | `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** | 174.474507208s | 55.873015583s | **-68.0%** | | `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` after `cargo clean -p codex-core`. | Pass | Baseline | This PR | Delta | |---|---:|---:|---:| | `total` | 187.150662083s | 68.978770375s | **-63.1%** | | `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%** | | `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.737515042s | 22.190207417s | **-72.2%** | | `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% | | `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% | | `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% | | process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s / 85.90s / 2.92s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` after `cargo clean -p codex-core`, summarized with `measureme summarize -p 0.5`. | Query / phase | Baseline self time | This PR self time | Delta | Baseline total time | This PR total time | Baseline item count | This PR item count | Baseline cache hits | This PR cache hits | |---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s | 48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 | | `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184 | n/a | 15,298 | | `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a | 79,247 | | `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s | n/a | 256 | n/a | 0 | | `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a | 0 | | `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 | n/a | 0 | | `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a | 99,990 | n/a | 0 | | `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a | 78,762 | n/a | 235,877 | | Summary | Baseline | This PR | |---|---:|---:| | `evaluate_obligation` % of total CPU | 70.821% | 38.880% | | self-profile total CPU time | 255.042999997s | 120.661175956s | | process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s / 103.66s / 3.54s | From the same `measureme summarize` output: | Artifact | Baseline | This PR | Delta | |---|---:|---:|---:| | `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 | | `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 | | `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448 | | `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 | | `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 | | `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 | | `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes | -6,073 | | `work_product_index` | 19,565 bytes | 19,565 bytes | 0 | These are the top normalized obligation buckets from the shared baseline profile: | Obligation bucket | Samples | Duration | |---|---:|---:| | `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s | | `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 | 5.63s | | `trait:T as tools::registry::ToolHandler` | 876 | 5.45s | | `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s | | `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s | | `outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider` | 637 | 3.73s | | `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s | | `outlives:tasks::regular::RegularTask` | 726 | 3.57s | Top `items_of_instance` entries before this change were mostly concrete async handler/task impls: | Instance | Duration | |---|---:| | `tasks::regular::{impl#2}::run` | 3.79s | | `tools::handlers::mcp::{impl#0}::handle` | 3.27s | | `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action` | 3.09s | | `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s | | `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s | | `tasks::review::{impl#4}::run` | 2.82s | | `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s | | `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` | 2.73s | | `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s | | `tasks::compact::{impl#4}::run` | 2.45s | Relevant pre-change registry shape: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/0bd31dc382bd1c33dc2bb6b97069c76aa10ba14b/codex-rs/core/src/tools/registry.rs#L38-L219) Current registry shape in this PR: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry.rs#L38-L203) - `ToolHandler::{is_mutating, handle}` now return native `impl Future + Send` futures instead of using `#[async_trait]`. - `AnyToolHandler` remains the object-safe adapter and boxes those futures at the registry boundary with explicit lifetimes. - Concrete handlers and the registry test handler drop `#[async_trait]` but otherwise keep their async method bodies intact. - Representative examples: [`codex-rs/core/src/tools/handlers/shell.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/shell.rs#L223-L379), [`codex-rs/core/src/tools/handlers/unified_exec.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/unified_exec.rs), [`codex-rs/core/src/tools/registry_tests.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry_tests.rs) This is intentionally less invasive than openai#16627: it does **not** move result boxing into every concrete handler and does **not** change `ToolHandler` into an object-safe trait. Instead, it keeps the existing registry-level type-erasure boundary and only removes the macro-generated async wrapper layer from concrete impls. So the runtime boxing story stays basically the same as before, while the compile-time savings are still large. Existing verification for this branch still applies: - Ran `cargo test -p codex-core`; this change compiled and the suite reached the known unrelated `config::tests::*guardian*` failures, with no local diff under `codex-rs/core/src/config/`. Profiling commands used for the tables above: - `cargo clean -p codex-core` - `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` - `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` - `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` - `measureme summarize -p 0.5`
rebroad
pushed a commit
to rebroad/codex
that referenced
this pull request
Apr 16, 2026
…penai#16630) `ToolHandler` was still paying a large compile-time tax from `#[async_trait]` on every concrete handler impl, even though the only object-safe boundary the registry actually stores is the internal `AnyToolHandler` adapter. This PR removes that macro-generated async wrapper layer from concrete `ToolHandler` impls while keeping the existing object-safe shim in `AnyToolHandler`. In practice, that gets essentially the same compile-time win as the larger type-erasure refactor in openai#16627, but with a much smaller diff and without changing the public shape of `ToolHandler<Output = T>`. That tradeoff matters here because this is a broad `codex-core` hotspot and reviewers should be able to judge the compile-time impact from hard numbers, not vibes. On a clean `codex-core` package rebuild (`cargo clean -p codex-core` before each command), rustc `total` dropped from **187.15s to 68.98s** versus the shared `0bd31dc382bd` baseline: **-63.1%**. The biggest hot passes dropped by roughly **71-72%**: | Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta | |---|---:|---:|---:| | `total` | 187.15s | 68.98s | **-63.1%** | | `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** | | `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%** | | `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** | Important caveat: `-Z time-passes` timings are nested, so `generate_crate_metadata` and `monomorphization_collector_graph_walk` are mostly overlapping, not additive. hotspot, but it got there by making `ToolHandler` object-safe and changing every handler to return `BoxFuture<Result<AnyToolResult, _>>` directly. This PR keeps the lower-churn shape: - `ToolHandler` remains generic over `type Output`. - Concrete handlers use native RPITIT futures with explicit `Send` bounds. - `AnyToolHandler` remains the only object-safe adapter and still does the boxing at the registry boundary, as before. - The implementation diff is only **33 files, +28/-77**. The measurements are at least comparable, and in this run this PR is slightly faster than openai#16627 on the pass-level total: | Metric | openai#16627 | This PR | Delta | |---|---:|---:|---:| | `total` | 79.90s | 68.98s | **-13.7%** | | `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** | | `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%** | | `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% | `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` after `cargo clean -p codex-core`. Baseline data below is reused from the shared parent `0bd31dc382bd` profile because this PR and openai#16627 are both one commit on top of that same parent. | Crate | Baseline `duration` | This PR `duration` | Delta | Baseline `rmeta_time` | This PR `rmeta_time` | Delta | |---|---:|---:|---:|---:|---:|---:| | `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** | 174.474507208s | 55.873015583s | **-68.0%** | | `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` after `cargo clean -p codex-core`. | Pass | Baseline | This PR | Delta | |---|---:|---:|---:| | `total` | 187.150662083s | 68.978770375s | **-63.1%** | | `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%** | | `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.737515042s | 22.190207417s | **-72.2%** | | `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% | | `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% | | `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% | | process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s / 85.90s / 2.92s | n/a | `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` after `cargo clean -p codex-core`, summarized with `measureme summarize -p 0.5`. | Query / phase | Baseline self time | This PR self time | Delta | Baseline total time | This PR total time | Baseline item count | This PR item count | Baseline cache hits | This PR cache hits | |---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s | 48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 | | `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184 | n/a | 15,298 | | `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a | 79,247 | | `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s | n/a | 256 | n/a | 0 | | `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a | 0 | | `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 | n/a | 0 | | `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a | 99,990 | n/a | 0 | | `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a | 78,762 | n/a | 235,877 | | Summary | Baseline | This PR | |---|---:|---:| | `evaluate_obligation` % of total CPU | 70.821% | 38.880% | | self-profile total CPU time | 255.042999997s | 120.661175956s | | process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s / 103.66s / 3.54s | From the same `measureme summarize` output: | Artifact | Baseline | This PR | Delta | |---|---:|---:|---:| | `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 | | `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 | | `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448 | | `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 | | `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 | | `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 | | `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes | -6,073 | | `work_product_index` | 19,565 bytes | 19,565 bytes | 0 | These are the top normalized obligation buckets from the shared baseline profile: | Obligation bucket | Samples | Duration | |---|---:|---:| | `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s | | `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 | 5.63s | | `trait:T as tools::registry::ToolHandler` | 876 | 5.45s | | `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s | | `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s | | `outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider` | 637 | 3.73s | | `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s | | `outlives:tasks::regular::RegularTask` | 726 | 3.57s | Top `items_of_instance` entries before this change were mostly concrete async handler/task impls: | Instance | Duration | |---|---:| | `tasks::regular::{impl#2}::run` | 3.79s | | `tools::handlers::mcp::{impl#0}::handle` | 3.27s | | `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action` | 3.09s | | `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s | | `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s | | `tasks::review::{impl#4}::run` | 2.82s | | `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s | | `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` | 2.73s | | `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s | | `tasks::compact::{impl#4}::run` | 2.45s | Relevant pre-change registry shape: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/0bd31dc382bd1c33dc2bb6b97069c76aa10ba14b/codex-rs/core/src/tools/registry.rs#L38-L219) Current registry shape in this PR: [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry.rs#L38-L203) - `ToolHandler::{is_mutating, handle}` now return native `impl Future + Send` futures instead of using `#[async_trait]`. - `AnyToolHandler` remains the object-safe adapter and boxes those futures at the registry boundary with explicit lifetimes. - Concrete handlers and the registry test handler drop `#[async_trait]` but otherwise keep their async method bodies intact. - Representative examples: [`codex-rs/core/src/tools/handlers/shell.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/shell.rs#L223-L379), [`codex-rs/core/src/tools/handlers/unified_exec.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/unified_exec.rs), [`codex-rs/core/src/tools/registry_tests.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry_tests.rs) This is intentionally less invasive than openai#16627: it does **not** move result boxing into every concrete handler and does **not** change `ToolHandler` into an object-safe trait. Instead, it keeps the existing registry-level type-erasure boundary and only removes the macro-generated async wrapper layer from concrete impls. So the runtime boxing story stays basically the same as before, while the compile-time savings are still large. Existing verification for this branch still applies: - Ran `cargo test -p codex-core`; this change compiled and the suite reached the known unrelated `config::tests::*guardian*` failures, with no local diff under `codex-rs/core/src/config/`. Profiling commands used for the tables above: - `cargo clean -p codex-core` - `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` - `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` - `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` - `measureme summarize -p 0.5`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
codex-corecompile time was dominated by repeated trait solving and monomorphization work from the oldToolHandler<Output = T>shape plus the blanket async adapter intoAnyToolHandler.That pattern instantiated one async wrapper per concrete tool handler, and the compiler then had to prove
Send/outlives obligations across a large set of generated future types. The profile showed this was not an LLVM/codegen problem; it was a trait-solving + monomorphization problem.The win from this PR is large enough that I want the numbers in the PR body, because this is a broad API change and reviewers are right to ask whether the tradeoff is worth it.
Headline result
Cold
codex-corelib compile time dropped from 187.15s to 79.90s on my machine after this refactor: -57.3% wall-clock in rustc'stotalpass.The biggest hot passes dropped by roughly 70%:
totalgenerate_crate_metadataMIR_borrow_checkingmonomorphization_collector_graph_walkevaluate_obligationself-timeImportant caveat:
-Z time-passestimings are nested, sogenerate_crate_metadataandmonomorphization_collector_graph_walkare mostly overlapping, not additive.Profile data
Baseline before this change
cargo +nightly build -p codex-core --lib -Z unstable-options --timings=jsonaftercargo clean -p codex-core:durationrmeta_timecodex-corestarlarkSo
codex-corewas far and away the dominant crate in the build.cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=jsonaftercargo clean -p codex-core:totalgenerate_crate_metadataMIR_borrow_checkingmonomorphization_collector_graph_walkcodegen_cratetype_check_cratecoherence_checkingreal/user/sys-Z self-profile+measureme summarize -p 0.5before this change:evaluate_obligationmir_borrowcktypeckLLVM_module_codegen_emit_objLLVM_passescodegen_modulereal/user/sysTop normalized obligation buckets before this change:
outlives:tasks::review::ReviewTaskoutlives:tools::handlers::unified_exec::UnifiedExecHandlertrait:T as tools::registry::ToolHandleroutlives:tools::handlers::shell::ShellHandleroutlives:tools::handlers::shell::ShellCommandHandleroutlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvideroutlives:tools::handlers::mcp::McpHandleroutlives:tasks::regular::RegularTaskTop
items_of_instanceentries before this change were mostly concrete async handler/task impls:tasks::regular::{impl#2}::runtools::handlers::mcp::{impl#0}::handletools::runtimes::shell::unix_escalation::{impl#2}::determine_actiontools::handlers::agent_jobs::{impl#11}::handletools::handlers::multi_agents::spawn::{impl#1}::handletasks::review::{impl#4}::runtools::handlers::multi_agents_v2::spawn::{impl#2}::handletools::handlers::multi_agents::resume_agent::{impl#1}::handletools::handlers::unified_exec::{impl#2}::handletasks::compact::{impl#4}::runAfter this change
cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=jsonaftercargo clean -p codex-core:totalgenerate_crate_metadataMIR_borrow_checkingmonomorphization_collector_graph_walkcodegen_cratereal/user/sys-Z self-profile+measureme summarize -p 0.5after this change:evaluate_obligationmir_borrowcktypecktype_op_prove_predicateLLVM_module_codegen_emit_objLLVM_passescodegen_modulereal/user/sysArtifact size deltas from the self-profile summaries:
crate_metadatadep_graphlinked_artifactobject_filequery_cachecgu_instructionscodegen_unit_size_estimatework_product_indexWhat changed
The main refactor is in
codex-rs/core/src/tools/registry.rs:ToolHandleris now object-safe and returnsBoxFuture<'_, Result<AnyToolResult, FunctionCallError>>directly.ToolRegistrystoresArc<dyn ToolHandler>directly.AnyToolResultnow wrapsBox<dyn ToolOutput>plus the invocation metadata.impl<T: ToolHandler> AnyToolHandler for Tadapter is gone, which is the point: stop generating one generic async adapter per concrete handler type.Representative handler updates:
codex-rs/core/src/tools/handlers/shell.rscodex-rs/core/src/tools/handlers/unified_exec.rscodex-rs/core/src/tools/handlers/mcp.rscodex-rs/core/src/tools/handlers/multi_agents.rscodex-rs/core/src/tools/handlers/multi_agents_v2.rsTradeoff
This intentionally moves a bit of work to runtime: each tool call now boxes the returned future and boxes the concrete
ToolOutputbehindAnyToolResult.That is a reasonable tradeoff here because these handlers are typically dominated by process spawning, MCP I/O, agent orchestration, or filesystem work, while the compile-time savings are very large and directly address the crate's current scaling problem.
Validation
cargo test -p codex-core --lib --no-runpassedcargo test -p codex-core --libran 1,541 tests: 1,533 passed, 3 ignored, 5 failedcodex-rs/core/src/config/config_tests.rs, so I do not believe they are caused by this refactor:config::tests::approvals_reviewer_defaults_to_manual_only_without_guardian_featureconfig::tests::approvals_reviewer_stays_manual_only_when_guardian_feature_is_enabledconfig::tests::approvals_reviewer_can_be_set_in_config_without_guardian_approvalconfig::tests::smart_approvals_alias_is_ignoredconfig::tests::smart_approvals_alias_is_ignored_in_profilesjust fix -p codex-corejust fmtI intentionally did not rerun tests after
just fix/just fmt, per the repo instructions in this checkout.