Skip to content

feat(ptc): programmatic tool calling -- executor, SDK, and E2E tests#408

Closed
zmanian wants to merge 5 commits intonearai:mainfrom
zmanian:feat/ptc-programmatic-tool-calling
Closed

feat(ptc): programmatic tool calling -- executor, SDK, and E2E tests#408
zmanian wants to merge 5 commits intonearai:mainfrom
zmanian:feat/ptc-programmatic-tool-calling

Conversation

@zmanian
Copy link
Copy Markdown
Collaborator

@zmanian zmanian commented Feb 28, 2026

Summary

Adds Programmatic Tool Calling (PTC) infrastructure from #407 (item 1), enabling tools to invoke other tools without LLM round-trips.

  • ToolExecutor: Standalone dispatch engine with timeout enforcement, nesting depth limit (max 5), and safety layer sanitization. Used by both the orchestrator HTTP RPC path and WASM tool_invoke host function.
  • Orchestrator endpoint: POST /worker/{job_id}/tools/call with auth, SSE event emission (JobToolUse + JobToolResult), and 503 when executor not configured.
  • WASM tool_invoke: Host function with alias resolution -- WASM tools call by alias (e.g. echo_alias), capabilities map aliases to real tool names.
  • Python SDK: Stdlib-only ironclaw_tools.py for container scripts. Reads connection details from env vars, provides call_tool() + convenience wrappers (shell, read_file, write_file, http_get).

Test plan

16 new tests across 4 suites:

  • 6 orchestrator HTTP RPC tests (cargo test orchestrator::api::tests --all-features)
    • Auth required (401), echo success, not found, no executor (503), timeout, SSE events
  • 3 executor integration tests (cargo test tools::executor::tests --all-features)
    • Safety sanitization (Bearer token redaction), invalid params error mapping, sequential call isolation
  • 4 Python SDK tests (cd sdk/python && python3 -m unittest test_ironclaw_tools)
    • Missing env vars, request format, HTTP error handling, convenience wrappers
  • 3 WASM E2E tests (cargo test tools::wasm::wrapper::tests --all-features -- --ignored)
    • Echo via alias, alias not granted, no capability (requires pre-built WASM binary)

Verification:

cargo test --all-features --lib           # 1825 passed, 0 failed
cargo clippy --all --all-features -- -D warnings  # zero warnings

Refs #407

Generated with Claude Code

@github-actions github-actions bot added size: XL 500+ changed lines scope: tool Tool infrastructure scope: tool/wasm WASM tool sandbox scope: orchestrator Container orchestrator scope: worker Container worker risk: medium Business logic, config, or moderate-risk modules and removed size: XL 500+ changed lines labels Feb 28, 2026
@github-actions github-actions bot added the contributor: experienced 6-19 merged PRs label Feb 28, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust system for Programmatic Tool Calling (PTC), significantly enhancing the flexibility and power of the IronClaw platform. By enabling tools to directly invoke other tools, it streamlines complex workflows and reduces reliance on LLM orchestration for sequential or nested operations. The changes span across the orchestrator's API, WASM runtime, and a new Python SDK, providing a comprehensive solution for direct tool interaction with built-in safety and control mechanisms.

Highlights

  • Programmatic Tool Calling (PTC) Infrastructure: Introduced core infrastructure for tools to invoke other tools directly, bypassing LLM round-trips. This includes a new ToolExecutor for dispatch, timeout enforcement, nesting depth limits (max 5), and safety layer sanitization.
  • Orchestrator API Endpoint for PTC: Added a new HTTP RPC endpoint (POST /worker/{job_id}/tools/call) to the orchestrator, allowing external clients (like the Python SDK) to programmatically call tools. This endpoint includes authentication, emits SSE events (JobToolUse, JobToolResult), and handles cases where the executor is not configured.
  • WASM tool_invoke Host Function: Implemented a tool_invoke host function for WASM tools, enabling them to call other tools synchronously. This feature supports alias resolution (WASM tools call by alias, capabilities map aliases to real tool names) and integrates with the ToolExecutor for consistent execution and safety.
  • Python SDK for Container Scripts: Developed a new Python SDK (ironclaw_tools.py) for container scripts, providing a call_tool() function and convenience wrappers (shell, read_file, write_file, http_get). This SDK simplifies programmatic tool interaction by reading connection details from environment variables.
  • Nesting Depth Management: Integrated tool_nesting_depth into the JobContext and ToolExecutor to enforce a global limit (max 5) on how deeply tools can invoke other tools, preventing infinite recursion in tool-calling chains.
Changelog
  • sdk/python/ironclaw_tools.py
    • Added a new Python SDK for programmatic tool calling, including call_tool and convenience wrappers like shell, read_file, write_file, and http_get.
  • sdk/python/test_ironclaw_tools.py
    • Added unit tests for the new Python SDK, covering environment variable handling, request formatting, HTTP error handling, and convenience wrapper functionality.
  • src/context/state.rs
    • Added tool_nesting_depth field to JobContext to track the current depth of programmatic tool calls.
  • src/db/libsql/jobs.rs
    • Initialized tool_nesting_depth to 0 when creating a new JobContext from the database.
  • src/history/store.rs
    • Initialized tool_nesting_depth to 0 when creating a new JobContext from the history store.
  • src/main.rs
    • Initialized ToolExecutor and passed it to the OrchestratorState for use in programmatic tool calling.
  • src/orchestrator/api.rs
    • Imported JobContext and ToolExecutor.
    • Added tool_executor field to OrchestratorState.
    • Added a new POST /worker/{job_id}/tools/call endpoint for programmatic tool invocation.
    • Implemented tool_call_handler to process programmatic tool call requests, execute tools via ToolExecutor, and emit SSE events.
    • Added several new tests for the programmatic tool calling endpoint, covering success, not found, no executor, SSE events, timeout, and authentication.
  • src/tools/executor.rs
    • Added a new module defining ToolExecutor, PtcToolResult, and PtcError.
    • Implemented ToolExecutor with logic for tool lookup, timeout enforcement, safety layer sanitization, and nesting depth checks.
    • Included unit tests for ToolExecutor covering various scenarios like tool not found, successful execution, timeout, nesting depth limits, and safety sanitization.
  • src/tools/mod.rs
    • Exported PtcError, PtcToolResult, and ToolExecutor from the executor module.
  • src/tools/registry.rs
    • Added tool_executor field to ToolRegistry to hold an optional ToolExecutor instance.
    • Added set_tool_executor method to allow injecting the ToolExecutor into the registry.
    • Modified register_wasm_tool to inject the ToolExecutor into WASM tool wrappers if available, enabling tool_invoke.
  • src/tools/wasm/wrapper.rs
    • Defined ToolResolver type for synchronous tool invocation from WASM.
    • Added tool_resolver and tool_nesting_depth fields to StoreData for WASM execution context.
    • Modified the tool_invoke host function to use the ToolResolver, enforce nesting depth, and handle parameter parsing.
    • Added tool_executor field to WasmToolWrapper and a with_tool_executor builder method.
    • Updated the execute method of WasmToolWrapper to construct and pass a ToolResolver closure to the WASM store, bridging synchronous WASM calls to asynchronous Rust tool execution.
    • Added integration tests for WASM programmatic tool calling, including successful echo invocation, alias not granted, and no capability scenarios.
  • src/worker/api.rs
    • Added ToolCallRequest and ToolCallResponse structs for programmatic tool calling.
    • Added call_tool asynchronous method to WorkerHttpClient for making programmatic tool calls to the orchestrator.
  • tools-src/test-ptc/Cargo.toml
    • Added Cargo.toml for a new WASM test tool (test-ptc-tool) to demonstrate programmatic tool calling.
  • tools-src/test-ptc/src/lib.rs
    • Added source code for test-ptc-tool, a WASM tool that uses the tool_invoke host function to call the 'echo' tool via an alias.
Activity
  • The pull request introduces a significant new feature, Programmatic Tool Calling (PTC), with comprehensive changes across multiple components.
  • New Python SDK and Rust ToolExecutor were added, along with modifications to the orchestrator API and WASM host functions.
  • Extensive test coverage was added for the new functionality, including unit tests for the Python SDK, ToolExecutor, and E2E tests for WASM tool invocation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature: Programmatic Tool Calling (PTC). The implementation is comprehensive, adding a ToolExecutor, a new orchestrator endpoint, a Python SDK, and integration with WASM host functions. The code is well-structured and includes extensive tests. I've identified a few areas for improvement, primarily concerning timeout logic and panic safety, which I've detailed in the review comments.

Comment thread sdk/python/ironclaw_tools.py Outdated
)

try:
with urllib.request.urlopen(req, timeout=max(timeout_secs or 60, 60) + 5) as resp:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The client-side timeout calculation for urlopen appears to be incorrect. The use of max(timeout_secs or 60, 60) enforces a minimum timeout of 60 seconds on the client, even if a shorter timeout_secs is provided (e.g., for http_get which defaults to 30s). This could lead to the client waiting longer than expected.

The client timeout should be based on the requested timeout_secs with a small buffer, without enforcing a large minimum. For example, with timeout_secs=30, the client timeout should be around 35 seconds, not 65.

Suggested change
with urllib.request.urlopen(req, timeout=max(timeout_secs or 60, 60) + 5) as resp:
with urllib.request.urlopen(req, timeout=(timeout_secs if timeout_secs is not None else 60) + 5) as resp:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed -- timeout now uses the actual timeout_secs (or 60s default) plus a 5s buffer.

Comment thread src/tools/executor.rs
Comment on lines +111 to +120
let timeout = timeout_override
.unwrap_or_else(|| {
let tool_timeout = tool.execution_timeout();
if tool_timeout > Duration::from_secs(MAX_TIMEOUT_SECS) {
self.default_timeout
} else {
tool_timeout
}
})
.min(Duration::from_secs(MAX_TIMEOUT_SECS));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for determining the execution timeout is a bit unusual. If a tool specifies a timeout greater than MAX_TIMEOUT_SECS, it falls back to self.default_timeout instead of being capped at MAX_TIMEOUT_SECS. This might be unexpected for a tool author who wants a long-running tool. For example, if a tool specifies a 10-minute timeout, it will get the default 60-second timeout, not the maximum 5-minute timeout.

It would be more intuitive to cap the tool's requested timeout at the maximum allowed value.

        let timeout = timeout_override
            .unwrap_or_else(|| tool.execution_timeout())
            .min(Duration::from_secs(MAX_TIMEOUT_SECS));

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed -- now caps at MAX_TIMEOUT_SECS via .min() instead of falling back to default.

Comment thread src/tools/wasm/wrapper.rs Outdated
Comment on lines +473 to +475
self.tool_nesting_depth += 1;
let result = resolver(&real_name, params, self.tool_nesting_depth);
self.tool_nesting_depth -= 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The manual increment and decrement of self.tool_nesting_depth is not panic-safe. If the resolver call panics, self.tool_nesting_depth will not be decremented. This could lead to subsequent tool_invoke calls within the same WASM execution to fail with a false "Nesting depth exceeded" error.

To make this robust, you could use a RAII guard to ensure the decrement happens even in case of a panic.

Example of a simple guard:

struct DepthGuard<'a>(&'a mut u32);

impl<'a> Drop for DepthGuard<'a> {
    fn drop(&mut self) {
        *self.0 -= 1;
    }
}

// In tool_invoke:
self.tool_nesting_depth += 1;
let _guard = DepthGuard(&mut self.tool_nesting_depth);
let result = resolver(&real_name, params, self.tool_nesting_depth);
// _guard is dropped here, decrementing the counter.
result

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed -- added a NestingGuard RAII struct that decrements on drop.

@github-actions github-actions bot added scope: tool/builtin Built-in tools scope: sandbox Docker sandbox size: XL 500+ changed lines labels Feb 28, 2026
zmanian added a commit to zmanian/ironclaw that referenced this pull request Mar 1, 2026
- Python SDK: remove 60s minimum timeout enforcement, respect
  requested timeout with 5s network buffer
- Rust executor: cap timeout at MAX_TIMEOUT_SECS instead of
  falling back to default when exceeded
- WASM wrapper: use RAII guard for nesting depth to prevent
  leak on panic

Addresses Gemini review feedback on PR nearai#408.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zmanian and others added 4 commits February 28, 2026 21:32
Add ToolExecutor for standalone tool dispatch used by both the
orchestrator HTTP RPC endpoint and the WASM tool_invoke host function.
Includes Python SDK for container scripts, WASM test fixture, and
comprehensive E2E test coverage across all PTC paths.

Implementation:
- ToolExecutor with timeout, nesting depth limit, safety sanitization
- Orchestrator POST /worker/{job_id}/tools/call endpoint with SSE events
- WASM tool_invoke host function with alias resolution
- Python SDK (stdlib-only) with call_tool + convenience wrappers

Tests (16 new):
- 6 orchestrator HTTP RPC tests (auth, echo, not-found, timeout, SSE, no-executor)
- 3 executor integration tests (sanitization, invalid params, sequential)
- 4 Python SDK tests (env vars, request format, HTTP error, wrappers)
- 3 WASM E2E tests (echo via alias, alias not granted, no capability)

Refs nearai#407

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r SDK

- Fix WASM tool_invoke production wiring: change tool_executor to a
  shared Arc<std::sync::RwLock> slot with lazy resolution so WASM tools
  registered during build_all() can access the executor set afterward
- Add nesting_depth field to ToolCallRequest and propagate it through
  the orchestrator's tool_call_handler into JobContext
- Add ptc_script built-in tool: runs Python scripts with ironclaw_tools
  SDK pre-imported, env-scrubbed subprocess, structured output support
- Copy Python SDK into Docker worker image at dist-packages path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Python SDK: remove 60s minimum timeout enforcement, respect
  requested timeout with 5s network buffer
- Rust executor: cap timeout at MAX_TIMEOUT_SECS instead of
  falling back to default when exceeded
- WASM wrapper: use RAII guard for nesting depth to prevent
  leak on panic

Addresses Gemini review feedback on PR nearai#408.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zmanian zmanian force-pushed the feat/ptc-programmatic-tool-calling branch from 8302787 to 4c6b9ce Compare March 1, 2026 05:37
- Fix Python SDK client timeout to use actual timeout_secs + 5s buffer
  instead of enforcing 60s minimum
- Cap tool execution timeout at MAX_TIMEOUT_SECS instead of falling
  back to default when exceeded
- Use RAII guard for tool_nesting_depth to ensure decrement on panic

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zmanian
Copy link
Copy Markdown
Collaborator Author

zmanian commented Mar 6, 2026

Reopening from upstream branch

@zmanian zmanian closed this Mar 6, 2026
zmanian added a commit that referenced this pull request Mar 7, 2026
- Python SDK: remove 60s minimum timeout enforcement, respect
  requested timeout with 5s network buffer
- Rust executor: cap timeout at MAX_TIMEOUT_SECS instead of
  falling back to default when exceeded
- WASM wrapper: use RAII guard for nesting depth to prevent
  leak on panic

Addresses Gemini review feedback on PR #408.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zmanian added a commit that referenced this pull request Mar 12, 2026
- Python SDK: remove 60s minimum timeout enforcement, respect
  requested timeout with 5s network buffer
- Rust executor: cap timeout at MAX_TIMEOUT_SECS instead of
  falling back to default when exceeded
- WASM wrapper: use RAII guard for nesting depth to prevent
  leak on panic

Addresses Gemini review feedback on PR #408.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zmanian added a commit that referenced this pull request Mar 12, 2026
- Python SDK: remove 60s minimum timeout enforcement, respect
  requested timeout with 5s network buffer
- Rust executor: cap timeout at MAX_TIMEOUT_SECS instead of
  falling back to default when exceeded
- WASM wrapper: use RAII guard for nesting depth to prevent
  leak on panic

Addresses Gemini review feedback on PR #408.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zmanian added a commit that referenced this pull request Mar 12, 2026
- Python SDK: remove 60s minimum timeout enforcement, respect
  requested timeout with 5s network buffer
- Rust executor: cap timeout at MAX_TIMEOUT_SECS instead of
  falling back to default when exceeded
- WASM wrapper: use RAII guard for nesting depth to prevent
  leak on panic

Addresses Gemini review feedback on PR #408.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zmanian added a commit that referenced this pull request Mar 13, 2026
- Python SDK: remove 60s minimum timeout enforcement, respect
  requested timeout with 5s network buffer
- Rust executor: cap timeout at MAX_TIMEOUT_SECS instead of
  falling back to default when exceeded
- WASM wrapper: use RAII guard for nesting depth to prevent
  leak on panic

Addresses Gemini review feedback on PR #408.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zmanian added a commit that referenced this pull request Mar 16, 2026
- Python SDK: remove 60s minimum timeout enforcement, respect
  requested timeout with 5s network buffer
- Rust executor: cap timeout at MAX_TIMEOUT_SECS instead of
  falling back to default when exceeded
- WASM wrapper: use RAII guard for nesting depth to prevent
  leak on panic

Addresses Gemini review feedback on PR #408.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zmanian added a commit that referenced this pull request Mar 17, 2026
- Python SDK: remove 60s minimum timeout enforcement, respect
  requested timeout with 5s network buffer
- Rust executor: cap timeout at MAX_TIMEOUT_SECS instead of
  falling back to default when exceeded
- WASM wrapper: use RAII guard for nesting depth to prevent
  leak on panic

Addresses Gemini review feedback on PR #408.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zmanian added a commit that referenced this pull request Mar 21, 2026
- Python SDK: remove 60s minimum timeout enforcement, respect
  requested timeout with 5s network buffer
- Rust executor: cap timeout at MAX_TIMEOUT_SECS instead of
  falling back to default when exceeded
- WASM wrapper: use RAII guard for nesting depth to prevent
  leak on panic

Addresses Gemini review feedback on PR #408.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: orchestrator Container orchestrator scope: sandbox Docker sandbox scope: tool/builtin Built-in tools scope: tool/wasm WASM tool sandbox scope: tool Tool infrastructure scope: worker Container worker size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant