feat(ptc): programmatic tool calling -- executor, SDK, and E2E tests#625
feat(ptc): programmatic tool calling -- executor, SDK, and E2E tests#625
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the system's capabilities by introducing programmatic tool calling. This allows tools to directly interact with and invoke other tools, enabling more complex, multi-step workflows without requiring an LLM to mediate each step. The changes span the core Rust backend, including a new tool execution engine and API endpoint, and extend to a new Python SDK for easier integration within worker containers, ultimately streamlining tool orchestration and improving efficiency. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces Programmatic Tool Calling (PTC), a significant new feature that adds a ToolExecutor, a new orchestrator endpoint, a WASM host function, and a Python SDK. While the architecture is generally well-designed, the current implementation contains critical security flaws. Most notably, a sandbox escape allows a compromised worker container to achieve Remote Code Execution (RCE) on the orchestrator host. Other issues include a bypassable nesting depth limit by malicious workers and the leakage of sensitive information via SSE events to the web UI. Additionally, consider improving the Python SDK's timeout handling for robustness.
| pub async fn execute( | ||
| &self, | ||
| tool_name: &str, | ||
| params: serde_json::Value, | ||
| ctx: &JobContext, | ||
| timeout_override: Option<Duration>, | ||
| ) -> Result<PtcToolResult, PtcError> { | ||
| // Enforce global nesting depth limit | ||
| if ctx.tool_nesting_depth >= MAX_NESTING_DEPTH { | ||
| return Err(PtcError::NestingDepthExceeded { | ||
| max: MAX_NESTING_DEPTH, | ||
| }); | ||
| } | ||
|
|
||
| let start = Instant::now(); | ||
|
|
||
| // Look up the tool | ||
| let tool = self | ||
| .tools | ||
| .get(tool_name) | ||
| .await | ||
| .ok_or_else(|| PtcError::NotFound { | ||
| name: tool_name.to_string(), | ||
| })?; | ||
|
|
||
| // Determine timeout: caller override -> tool's own timeout -> default, | ||
| // capped at MAX_TIMEOUT_SECS. | ||
| let timeout = timeout_override | ||
| .unwrap_or_else(|| tool.execution_timeout()) | ||
| .min(Duration::from_secs(MAX_TIMEOUT_SECS)); | ||
|
|
||
| // Execute with timeout | ||
| let tool_result = tokio::time::timeout(timeout, tool.execute(params, ctx)) | ||
| .await | ||
| .map_err(|_| PtcError::Timeout { | ||
| name: tool_name.to_string(), | ||
| timeout, | ||
| })? | ||
| .map_err(|e| match e { | ||
| crate::tools::ToolError::InvalidParameters(reason) => PtcError::InvalidParameters { | ||
| name: tool_name.to_string(), | ||
| reason, | ||
| }, | ||
| crate::tools::ToolError::RateLimited(_) => PtcError::RateLimited { | ||
| name: tool_name.to_string(), | ||
| }, | ||
| other => PtcError::ExecutionFailed { | ||
| name: tool_name.to_string(), | ||
| reason: other.to_string(), | ||
| }, | ||
| })?; |
There was a problem hiding this comment.
CRITICAL: Sandbox Escape / Remote Code Execution (RCE) on Orchestrator Host.
The ToolExecutor executes tools directly on the host where the orchestrator is running. It does not check the domain() of the tool being executed. If a tool with ToolDomain::Container (like ptc_script or shell) is registered in the orchestrator's tool registry, a compromised worker container can use the POST /worker/{job_id}/tools/call endpoint to execute arbitrary code on the orchestrator host.
In a typical deployment, the orchestrator and agent share a registry that includes development tools. This allows any worker to escape its sandbox and compromise the host system.
There was a problem hiding this comment.
Fixed in d8ff994. Added a tool.domain() check in ToolExecutor::execute() that rejects ToolDomain::Container tools with PtcError::DomainBlocked. Container-domain tools (shell, file ops) can no longer be invoked through the PTC endpoint on the orchestrator host. Regression test: test_container_domain_blocked.
| format!("Programmatic tool call from job {}", job_id), | ||
| ); | ||
| // Propagate nesting depth so the executor enforces the global limit | ||
| ctx.tool_nesting_depth = req.nesting_depth; |
There was a problem hiding this comment.
MEDIUM: Nesting Depth Limit Bypass.
The nesting_depth is provided by the untrusted worker and is trusted by the orchestrator without validation or increment. A malicious worker can always send nesting_depth: 0 in every request to bypass the MAX_NESTING_DEPTH limit (currently 5). This can lead to infinite recursion and Denial of Service (DoS) on the orchestrator due to resource exhaustion (e.g., spawning infinite processes or runtimes).
There was a problem hiding this comment.
Fixed in d8ff994. The orchestrator now floors client-provided nesting_depth at 1 (req.nesting_depth.max(1)) since PTC calls from workers are inherently at depth >= 1. A malicious worker sending nesting_depth: 0 will have it corrected to 1.
| if let Some(ref tx) = state.job_event_tx { | ||
| let _ = tx.send(( | ||
| job_id, | ||
| SseEvent::JobToolUse { | ||
| job_id: job_id.to_string(), | ||
| tool_name: req.tool_name.clone(), | ||
| input: req.parameters.clone(), | ||
| }, | ||
| )); | ||
| } |
There was a problem hiding this comment.
MEDIUM: Sensitive Information Leak via SSE.
The tool_call_handler broadcasts JobToolUse events containing the raw, unsanitized parameters provided by the worker. If a tool is called with sensitive data (e.g., API keys, passwords, or PII), this information will be leaked to any user observing the SSE stream in the web UI. While tool outputs are sanitized, tool inputs are currently broadcast in the clear.
There was a problem hiding this comment.
Fixed in d8ff994. Tool parameters are now redacted before broadcasting via SSE. The JobToolUse event sends {"_note": "parameters redacted for security"} instead of raw parameters.
| def call_tool(name, params=None, timeout_secs=None): | ||
| """Call a tool on the orchestrator by name. | ||
|
|
||
| Args: | ||
| name: Tool name (e.g., "echo", "shell", "read_file"). | ||
| params: Dictionary of parameters to pass to the tool. | ||
| timeout_secs: Optional timeout in seconds (max 300). | ||
|
|
||
| Returns: | ||
| Tool output as a string. | ||
|
|
||
| Raises: | ||
| RuntimeError: If the tool call fails. | ||
| """ | ||
| url = f"{_base_url()}/tools/call" | ||
| body = { | ||
| "tool_name": name, | ||
| "parameters": params or {}, | ||
| } | ||
| if timeout_secs is not None: | ||
| body["timeout_secs"] = min(int(timeout_secs), 300) | ||
|
|
||
| data = json.dumps(body).encode("utf-8") | ||
| req = urllib.request.Request( | ||
| url, | ||
| data=data, | ||
| headers={ | ||
| "Content-Type": "application/json", | ||
| "Authorization": f"Bearer {_token()}", | ||
| }, | ||
| method="POST", | ||
| ) | ||
|
|
||
| try: | ||
| with urllib.request.urlopen(req, timeout=(timeout_secs if timeout_secs is not None else 60) + 5) as resp: | ||
| result = json.loads(resp.read().decode("utf-8")) |
There was a problem hiding this comment.
The current timeout logic can lead to premature client-side timeouts. When timeout_secs is not provided, the client-side HTTP request timeout defaults to 65 seconds. However, the server-side tool execution may have a longer default timeout (e.g., ptc_script defaults to 120s). This mismatch will cause the client to time out while the server is still correctly processing the tool.
To fix this, I suggest making the timeout explicit by providing a default value in call_tool and consistently using it for both the server request and the client-side timeout calculation. This ensures consistent behavior and prevents unexpected timeouts.
def call_tool(name, params=None, timeout_secs=60):
"""Call a tool on the orchestrator by name.
Args:
name: Tool name (e.g., "echo", "shell", "read_file").
params: Dictionary of parameters to pass to the tool.
timeout_secs: Timeout in seconds (default 60, max 300).
Returns:
Tool output as a string.
Raises:
RuntimeError: If the tool call fails.
"""
url = f"{_base_url()}/tools/call"
server_timeout = min(int(timeout_secs), 300)
body = {
"tool_name": name,
"parameters": params or {},
"timeout_secs": server_timeout,
}
data = json.dumps(body).encode("utf-8")
req = urllib.request.Request(
url,
data=data,
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {_token()}",
},
method="POST",
)
try:
# The client-side timeout for the HTTP request should be slightly longer
# than the server-side tool execution timeout to account for network latency.
client_timeout = server_timeout + 5
with urllib.request.urlopen(req, timeout=client_timeout) as resp:
result = json.loads(resp.read().decode("utf-8"))There was a problem hiding this comment.
Fixed in d8ff994. call_tool now defaults to timeout_secs=60, always sends timeout_secs to the server, and uses server_timeout + 5 for the client-side HTTP timeout to account for network latency.
Proposal: Evaluate PTC Effectiveness Using Trajectory Benchmarks Before MergeBefore merging this PR, I'd like to use it as a proof-of-concept for the trajectory benchmark system (#467) -- demonstrating that we can objectively evaluate whether a new agent capability actually helps. The QuestionDoes PTC reduce LLM round-trips and cost for multi-step tool chains, or does the overhead of generating Python scripts negate the savings? MethodologyUse the existing
Candidate TasksThese are tasks with sequential tool chains where PTC should theoretically reduce round-trips:
Metrics to Compare (per task)All already captured by
Recording Protocol
Comparison ReportUse the existing Why This Matters Beyond PTCThis establishes a repeatable methodology for evaluating any future agent capability change:
Statistical CaveatSingle recordings aren't statistically rigorous. Plan is to record each prompt 3-5x per branch and report mean/stddev for key metrics. This gives directional evidence, not a p-value -- but it's far more than "trust me, it helps." Next Steps
cc @ilblackdragon -- this ties directly to #467 and the nearai/benchmarks work. The trajectory infrastructure can answer "does feature X make the agent better?" questions with data. |
40d147e to
69b9241
Compare
zmanian
left a comment
There was a problem hiding this comment.
Review: feat(ptc): programmatic tool calling -- executor, SDK, and E2E tests
Reviewed the full diff across all 8 commits, with focus on the security fixes from the previous Gemini review (d8ff994) and the new code.
Previous review findings -- status
All four items from the Gemini review have been addressed:
-
Container domain blocking (CRITICAL) -- Fixed.
ToolExecutor::execute()now checkstool.domain() == ToolDomain::Containerand returnsPtcError::DomainBlocked. Regression test added (test_container_domain_blocked). -
Nesting depth bypass (MEDIUM) -- Fixed.
tool_call_handlerfloorsnesting_depthat 1 via.max(1). -
SSE parameter leak (MEDIUM) -- Fixed. SSE
JobToolUseevents now emit{"_note": "parameters redacted for security"}instead of raw params. -
Python SDK timeout mismatch (MEDIUM) -- Fixed.
call_toolnow defaults totimeout_secs=60, always sends it to the server, and usesserver_timeout + 5for client HTTP timeout.
New findings
BUG: UTF-8 byte-index slicing in truncate_output (ptc_script.rs:123)
&output[..MAX_OUTPUT_SIZE]This will panic on multi-byte UTF-8 characters if MAX_OUTPUT_SIZE falls in the middle of a character boundary. Python scripts can easily produce non-ASCII output (CJK, emoji, etc.). Per CLAUDE.md: "Never use byte-index slicing (&s[..n]) on user-supplied or external strings -- it panics on multi-byte characters." Fix: walk backwards with is_char_boundary() or use char_indices().
MINOR: NestingGuard could underflow on misuse
The NestingGuard::drop does *self.depth -= 1 which would underflow/panic if depth were 0. Currently safe because it's always preceded by += 1, but saturating_sub(1) would be more defensive and has zero cost.
MINOR: tool_nesting_depth not incremented for orchestrator-side execution
When the orchestrator's tool_call_handler calls executor.execute(), the tool_nesting_depth in the JobContext is set to req.nesting_depth.max(1), but the executor doesn't increment it before executing the tool. This means if the tool itself triggers a PTC call back to the orchestrator (recursive tool call), the nesting counter stays at 1 rather than incrementing. The WASM path handles this correctly via NestingGuard, but the HTTP RPC path relies solely on the client sending an honest nesting depth. Since the domain blocking prevents Container-domain tools from running on the orchestrator (and those are the ones that could recurse via the Python SDK), this is low-risk in practice, but worth noting for future extensibility.
OBSERVATION: ptc_script correctly declares ToolDomain::Container
This is important -- ptc_script is Container domain, so ToolExecutor will block it from running via the orchestrator PTC endpoint. This means it can only run inside worker containers (where the Python SDK env vars are set), which is the intended design. Good.
OBSERVATION: extra_env forwarding in ptc_script
The tool forwards ctx.extra_env to the Python subprocess. This is the correct mechanism for worker-injected credentials, but note that any key-value pair in extra_env will be visible to the Python script. This is acceptable since ptc_script already requires ApprovalRequirement::Always, but worth documenting.
Code quality
- Error handling is clean --
thiserrorforPtcError, proper error mapping throughout - No
.unwrap()in production code (the ones in executor.rs are in test functions or.unwrap_or()safe variants) - Good test coverage: 16 tests across 4 modules covering happy paths, errors, security boundaries
std::sync::RwLockusage for the executor slot is correctly motivated (WASMspawn_blockingcontext)- Python SDK is stdlib-only as claimed, clean and minimal
- WASM test fixture (
tools-src/test-ptc) is well-structured
Verdict
The UTF-8 byte-slicing bug in truncate_output should be fixed before merge -- it's a crash on non-ASCII output, which is a real scenario for a tool that runs arbitrary Python scripts. The rest is solid.
These steps make sense. |
38a974d to
68b1f22
Compare
f1d7951 to
467904e
Compare
dbe11cb to
291d34a
Compare
ilblackdragon
left a comment
There was a problem hiding this comment.
Review: feat(ptc): programmatic tool calling -- executor, SDK, and E2E tests
This is a well-architected PR. The ToolExecutor abstraction, lazy slot pattern for deferred wiring, container domain blocking, SSE redaction, and the Python SDK (stdlib-only, zero deps) are all thoughtfully designed. The follow-up security commits (54e37590, bde17202) address the critical server-side nesting-depth trust issue and SSE parameter leakage -- good work closing those.
That said, there are a few issues remaining, one of which is a blocker.
1. BLOCKING: UTF-8 panic in truncate_output (src/tools/builtin/ptc_script.rs, line ~123)
&output[..MAX_OUTPUT_SIZE]Indexing a &str at a byte offset that falls inside a multi-byte UTF-8 codepoint panics at runtime. Python scripts routinely produce non-ASCII output (emoji, CJK, accented characters, even repr() of bytes). When MAX_OUTPUT_SIZE (65536) lands mid-character, the process crashes.
Fix options (pick one):
- Walk backward with
output.floor_char_boundary(MAX_OUTPUT_SIZE)(nightly, or hand-roll withis_char_boundary) - Use
output.char_indices().take_while(|(i, _)| *i < MAX_OUTPUT_SIZE).last()to find the safe cut point String::from_utf8_lossy(&output.as_bytes()[..MAX_OUTPUT_SIZE])(replaces partial char with U+FFFD -- acceptable for truncation)
The existing test test_truncate_output only uses ASCII "x" characters and does not catch this. Please add a test with multi-byte characters (e.g., a string of "\u{1F600}" repeated past the limit).
2. Should fix: NestingGuard::drop uses raw subtraction (src/tools/wasm/wrapper.rs, line 49)
impl Drop for NestingGuard<'_> {
fn drop(&mut self) {
*self.depth -= 1;
}
}If *self.depth is somehow 0 when the guard drops (e.g., a logic error sets it to 0 before the guard runs, or a future refactor introduces a double-drop path), this panics in debug builds and wraps to u32::MAX in release -- silently disabling the nesting limit.
Use *self.depth = self.depth.saturating_sub(1); -- it compiles to the same machine code on the happy path and eliminates the edge case. The orchestrator side already uses saturating_add(1) (line 483 of api.rs), so the defensive style is consistent.
3. Should fix: set_tool_executor silently swallows lock poisoning (src/tools/registry.rs, line ~147)
pub fn set_tool_executor(&self, executor: Arc<ToolExecutor>) {
if let Ok(mut guard) = self.tool_executor_slot.write() {
*guard = Some(executor);
}
}A poisoned RwLock means a previous holder panicked. If this silently fails, every subsequent WASM PTC call gets a cryptic "no tool executor configured" error with zero diagnostic trail. At minimum, add:
} else {
tracing::error!("tool_executor_slot RwLock is poisoned; PTC will be unavailable");
}This gives operators a single log line pointing at the root cause instead of a stream of mysterious PTC failures.
4. Should fix: stray // ci fix comment (src/tools/builtin/memory.rs, line 639)
The file ends with a bare // ci fix after the closing brace of the test module. This looks like a leftover from a CI troubleshooting session. Please remove it.
5. Architecture notes (positive)
- ToolExecutor + lazy slot: Clean separation of concerns. WASM tools registered before the executor exists still get PTC access at execution time -- avoids circular dependency.
- Container domain blocking:
ToolDomain::Containercheck inexecutor.rsprevents sandbox-escape via PTC. The regression test (test_container_domain_blocked) is good. - SSE redaction (
54e37590): Closing the parameter-leak path on worker-reported events is important -- glad this was caught and fixed. - Python SDK: stdlib-only, clear docstrings, structured output via
ptc_output(). The_env()helper with descriptive error messages is user-friendly.
6. Nice to have (non-blocking)
- Add a test for
truncate_outputwith multi-byte characters (4-byte emoji strings that cross the boundary). - Add a test that creates a
NestingGuardwithdepth = 0and drops it, verifying it does not panic (oncesaturating_subis in place). - Consider whether
MAX_OUTPUT_SIZEshould be configurable (env var or tool param) -- 64KB is reasonable as a default but some scripts may legitimately produce more.
Summary: One blocker (UTF-8 panic), two defensive-coding fixes (NestingGuard, lock poisoning), and one cleanup (stray comment). The overall design is solid.
|
Thanks for the thorough review @ilblackdragon. Addressing each point:
The inline findings from @gemini (container domain blocking, nesting depth bypass, SSE leak, timeout mismatch) have already been addressed in previous commits. Will push the fixes shortly. |
Addressing review feedback from @ilblackdragonAll 4 items from your review have been addressed in commit 8ffc721: 1. BLOCKING: UTF-8 panic in
|
f65fcf9 to
77a50b2
Compare
|
All review feedback has been addressed in the latest commits:
Ready for re-review when you get a chance. Thanks! |
Add ToolExecutor for standalone tool dispatch used by both the
orchestrator HTTP RPC endpoint and the WASM tool_invoke host function.
Includes Python SDK for container scripts, WASM test fixture, and
comprehensive E2E test coverage across all PTC paths.
Implementation:
- ToolExecutor with timeout, nesting depth limit, safety sanitization
- Orchestrator POST /worker/{job_id}/tools/call endpoint with SSE events
- WASM tool_invoke host function with alias resolution
- Python SDK (stdlib-only) with call_tool + convenience wrappers
Tests (16 new):
- 6 orchestrator HTTP RPC tests (auth, echo, not-found, timeout, SSE, no-executor)
- 3 executor integration tests (sanitization, invalid params, sequential)
- 4 Python SDK tests (env vars, request format, HTTP error, wrappers)
- 3 WASM E2E tests (echo via alias, alias not granted, no capability)
Refs #407
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r SDK - Fix WASM tool_invoke production wiring: change tool_executor to a shared Arc<std::sync::RwLock> slot with lazy resolution so WASM tools registered during build_all() can access the executor set afterward - Add nesting_depth field to ToolCallRequest and propagate it through the orchestrator's tool_call_handler into JobContext - Add ptc_script built-in tool: runs Python scripts with ironclaw_tools SDK pre-imported, env-scrubbed subprocess, structured output support - Copy Python SDK into Docker worker image at dist-packages path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Python SDK: remove 60s minimum timeout enforcement, respect requested timeout with 5s network buffer - Rust executor: cap timeout at MAX_TIMEOUT_SECS instead of falling back to default when exceeded - WASM wrapper: use RAII guard for nesting depth to prevent leak on panic Addresses Gemini review feedback on PR #408. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix Python SDK client timeout to use actual timeout_secs + 5s buffer instead of enforcing 60s minimum - Cap tool execution timeout at MAX_TIMEOUT_SECS instead of falling back to default when exceeded - Use RAII guard for tool_nesting_depth to ensure decrement on panic Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Alphabetize PromptQueue before PtcScriptTool to pass cargo fmt check. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… redaction Three security fixes from code review: 1. CRITICAL: Block Container-domain tools from executing on the orchestrator host. The ToolExecutor now checks tool.domain() and rejects Container tools with PtcError::DomainBlocked, preventing sandbox escape / RCE. 2. MEDIUM: Floor client-provided nesting_depth at 1 instead of trusting the worker's value. A malicious worker can no longer send nesting_depth=0 to bypass MAX_NESTING_DEPTH. 3. MEDIUM: Redact tool parameters in SSE JobToolUse events to prevent leaking sensitive data (API keys, passwords) to web UI observers. 4. Python SDK: Always send timeout_secs to server and use server_timeout+5 for client-side HTTP timeout to prevent premature client timeouts. Regression test: test_container_domain_blocked verifies Container-domain tools are rejected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
[skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The import was dropped during rebase onto staging, causing compilation failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Staging moved schema hint logic to display-time in schema() method, so the construction-time call from PTC is no longer needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Nesting depth: change from .max(1) to .saturating_add(1) so the orchestrator always increments the depth server-side rather than trusting the client-supplied value. This prevents a malicious worker from bypassing the nesting limit by always sending 0. - SSE redaction: redact raw input parameters from worker-reported tool_use events in job_event_handler before broadcasting via SSE. Previously only the PTC path redacted; worker-reported events leaked raw parameters (potentially containing API keys, passwords, PII) to the web UI. - Domain check and Python SDK timeout were already addressed in the current branch. - Add regression tests for both fixes. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… lock poison logging, cleanup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…for_schema Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove redundant #![cfg(test)] inner attribute from codex_test_helpers.rs (already gated by #[cfg(test)] in mod.rs), fixing the clippy duplicated_attributes warning. Also apply cargo fmt import reordering in tools/mod.rs. https://claude.ai/code/session_017ckCCurNiBL8uzE4dJg59K
f689ec0 to
71d5c49
Compare
|
All four review items from @ilblackdragon have been addressed (in commit
Ready for re-review. |
|
Thanks for the thorough review @ilblackdragon. All four items addressed in db1e83c: 1. UTF-8 panic in 2. 3. 4. Stray |
|
Closing in favor of #1557 (v2 architecture), which introduces a unified execution model with CodeAct that supersedes the PTC approach. The v2 engine's Capability primitive and Monty-based tool composition provide a more general solution for tool-to-tool dispatch. |
Pull request was closed
Summary
Adds Programmatic Tool Calling (PTC) infrastructure from #407 (item 1), enabling tools to invoke other tools without LLM round-trips.
tool_invokehost function.POST /worker/{job_id}/tools/callwith auth, SSE event emission (JobToolUse+JobToolResult), and 503 when executor not configured.tool_invoke: Host function with alias resolution -- WASM tools call by alias (e.g.echo_alias), capabilities map aliases to real tool names.ironclaw_tools.pyfor container scripts. Reads connection details from env vars, providescall_tool()+ convenience wrappers (shell,read_file,write_file,http_get).Test plan
16 new tests across 4 suites:
cargo test orchestrator::api::tests --all-features)cargo test tools::executor::tests --all-features)cd sdk/python && python3 -m unittest test_ironclaw_tools)cargo test tools::wasm::wrapper::tests --all-features -- --ignored)Verification:
Refs #407
Generated with Claude Code