Skip to content

feat: merge http/web_fetch tools, add tool output stash for large responses#578

Merged
ilblackdragon merged 7 commits intomainfrom
feat/merge-http-tools-stash
Mar 6, 2026
Merged

feat: merge http/web_fetch tools, add tool output stash for large responses#578
ilblackdragon merged 7 commits intomainfrom
feat/merge-http-tools-stash

Conversation

@ilblackdragon
Copy link
Copy Markdown
Member

Summary

  • Merge web_fetch into http tool with smart approval: plain GETs (no auth headers, no body) need no approval and follow redirects with SSRF re-validation per hop; everything else requires approval as before
  • Add tool_output_stash on JobContext to preserve full tool outputs before safety-layer truncation. The json tool gains source_tool_call_id to reference stashed outputs, enabling reliable parsing of large API responses (>100KB)
  • Fix json tool query/stringify operations to handle pre-parsed JSON data from the stash (not just string inputs)
  • Add descriptive User-Agent header using CARGO_PKG_VERSION so public APIs don't reject requests
  • Truncation now keeps partial data + hint about source_tool_call_id instead of replacing everything with an error message
  • System prompt reinforces actual tool_calls over narrating intent

Test plan

  • recorded_baseball_stats — E2E trace test: large ESPN API response → truncated in context → json tool queries full output via source_tool_call_id
  • recorded_weather_sf — E2E trace test: simple GET to weather API
  • test_query_with_object_data_from_stash — regression test for json query with pre-parsed data
  • test_stringify_with_object_data_from_stash — regression test for json stringify with pre-parsed data
  • test_plain_get_returns_never — approval requirement for simple GETs
  • test_post_returns_unless_auto_approved — approval requirement for POST
  • test_get_with_headers_returns_unless_auto_approved — approval requirement for GET with headers
  • All existing json, http, and recorded trace tests pass
  • Zero clippy warnings

🤖 Generated with Claude Code

…ponses

Merge `web_fetch` into `http` tool with smart approval: plain GETs (no
headers, no body) run without approval and follow redirects with SSRF
re-validation per hop; all other requests require approval as before.

Add `tool_output_stash` on JobContext so full tool outputs are preserved
before safety-layer truncation. The `json` tool gains a
`source_tool_call_id` parameter to reference stashed outputs, enabling
reliable parsing of large API responses that exceed the 100KB context
limit.

Other improvements:
- Descriptive User-Agent header using CARGO_PKG_VERSION
- Truncation now keeps partial data + hint about source_tool_call_id
- System prompt reinforces tool_calls over narration
- json tool query/stringify handle pre-parsed (non-string) data

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 5, 2026 19:30
@github-actions github-actions bot added scope: agent Agent core (agent loop, router, scheduler) scope: tool Tool infrastructure scope: tool/builtin Built-in tools scope: safety Prompt injection defense scope: llm LLM integration size: L 200-499 changed lines risk: high Safety, secrets, auth, or critical infrastructure contributor: core 20+ merged PRs labels Mar 5, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agent's ability to interact with external HTTP services and process large data payloads. By unifying HTTP functionality and introducing a mechanism to stash full tool outputs, the system can now handle complex API interactions more robustly and efficiently. The improvements to the JSON tool and output truncation ensure that critical information is not lost, leading to more reliable and accurate agent responses, particularly when dealing with extensive data from web services.

Highlights

  • Unified HTTP Tool with Smart Approval: The web_fetch tool has been merged into the http tool, which now features smart approval logic. Simple GET requests without authentication headers or body will automatically follow redirects and require no approval, while all other HTTP requests will continue to require explicit approval.
  • Tool Output Stash for Large Responses: A tool_output_stash has been added to JobContext to store full, untruncated tool outputs. This allows subsequent tools, like the json tool, to access complete responses even if they were truncated in the LLM context window due to size limits.
  • Enhanced JSON Tool: The json tool now supports a source_tool_call_id parameter, enabling it to reference and process full outputs from the tool_output_stash. It also correctly handles pre-parsed JSON data, not just string inputs, for query and stringify operations.
  • Improved Tool Output Truncation: Truncation of large tool outputs now preserves partial data at the beginning of the output and includes a hint instructing how to retrieve the full content using the json tool with source_tool_call_id, rather than replacing the entire output with an error message.
  • Descriptive User-Agent Header: HTTP requests now include a descriptive User-Agent header, using the application's package version, to improve compatibility and avoid rejection by public APIs.
  • System Prompt Reinforcement: The system prompt has been updated to reinforce the importance of using actual tool_calls over narrating intent, guiding the LLM towards more direct tool usage.
Changelog
  • src/agent/dispatcher.rs
    • Added logic to stash the full output of successful tool calls into the job_ctx.tool_output_stash.
  • src/context/state.rs
    • Introduced tool_output_stash (an Arc<RwLock<HashMap<String, String>>>) to JobContext for storing complete tool outputs.
    • Initialized tool_output_stash in the JobContext::new constructor.
  • src/db/libsql/jobs.rs
    • Initialized tool_output_stash when creating a JobContext from the database.
  • src/history/store.rs
    • Initialized tool_output_stash when restoring a JobContext from history.
  • src/llm/reasoning.rs
    • Updated the system prompt to emphasize using tool_calls directly and including tool calls in responses when fetching information.
  • src/safety/mod.rs
    • Modified sanitize_tool_output to truncate large outputs by keeping the beginning and adding a notice about using source_tool_call_id to retrieve the full content, instead of replacing it with a generic truncation message.
  • src/tools/builtin/http.rs
    • Merged web_fetch functionality into the http tool, making it a unified HTTP client.
    • Added a descriptive User-Agent header using CARGO_PKG_VERSION to HTTP requests.
    • Implemented smart approval logic: simple GET requests (no headers, no body) now require no approval and follow redirects with SSRF re-validation per hop.
    • Defined MAX_REDIRECTS constant for simple GET requests.
    • Updated the tool's description to reflect the new approval and redirect behavior.
    • Refactored HTTP request execution to conditionally follow redirects for simple GETs.
    • Added new unit tests for requires_approval covering plain GETs, POSTs, GETs with headers, and GETs with empty headers/no credentials.
  • src/tools/builtin/json.rs
    • Updated the json tool's description to mention source_tool_call_id for handling truncated responses.
    • Modified the parameters_schema to include source_tool_call_id as an optional parameter and made data optional when source_tool_call_id is present.
    • Updated execute method to retrieve data from tool_output_stash if source_tool_call_id is provided, parsing it as JSON or a string.
    • Adjusted stringify and query operations to handle both string and pre-parsed JSON input data.
    • Added new Tokio tests (test_query_with_object_data_from_stash, test_stringify_with_object_data_from_stash) to validate functionality with stashed data.
  • src/tools/builtin/mod.rs
    • Removed the web_fetch module and its public export, as its functionality was integrated into http.
  • src/tools/registry.rs
    • Removed WebFetchTool from the list of built-in tools registered.
    • Removed web_fetch from the PROTECTED_TOOL_NAMES list.
  • tests/e2e_recorded_trace.rs
    • Added recorded_weather_sf E2E test for simple GET requests.
    • Added recorded_baseball_stats E2E test to verify tool_output_stash and source_tool_call_id functionality with large HTTP responses.
  • tests/fixtures/llm_traces/recorded/baseball_stats.json
    • Added a new recorded trace file for testing baseball stats retrieval, demonstrating the use of http and json tools with source_tool_call_id.
  • tests/fixtures/llm_traces/recorded/weather_sf.json
    • Added a new recorded trace file for testing weather queries, demonstrating a simple http GET request.
  • tests/tool_schema_validation.rs
    • Removed web_fetch from the list of expected core tools for schema validation.
Activity
  • The author has implemented the changes and provided a comprehensive test plan, including new E2E trace tests (recorded_baseball_stats, recorded_weather_sf) and regression tests for the json tool and HTTP approval logic.
  • All existing json, http, and recorded trace tests are reported to pass.
  • Zero clippy warnings were reported, indicating adherence to Rust linting standards.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves tool handling by merging web_fetch and http tools, and introducing a tool_output_stash for large responses. The new http tool features intelligent approval handling and secure redirect following with per-hop SSRF validation, while the json tool is enhanced to process large outputs using the stash. However, a critical security flaw exists where truncated tool outputs bypass leak detection and sanitization, risking sensitive information leakage or prompt injection attacks, which violates the rule that sanitization should be applied to data paths sent to external services like an LLM. Additionally, there are suggestions to improve the json tool's schema definition and code clarity.

Comment thread src/tools/builtin/json.rs Outdated
Comment on lines +58 to +74
let data = if let Some(ref_id) = params.get("source_tool_call_id").and_then(|v| v.as_str())
{
let stash = ctx.tool_output_stash.read().await;
let full_output = stash.get(ref_id).ok_or_else(|| {
ToolError::InvalidParameters(format!(
"no tool output found for call ID '{}'. Available IDs: {:?}",
ref_id,
stash.keys().collect::<Vec<_>>()
))
})?;
// Parse the stashed output as JSON, or wrap as string
serde_json::from_str::<serde_json::Value>(full_output)
.unwrap_or_else(|_| serde_json::Value::String(full_output.clone()))
} else {
require_param(&params, "data")?.clone()
};
let data = &data;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable data is assigned an owned serde_json::Value and then immediately re-shadowed as a reference (let data = &data;). This pattern can be slightly confusing to read. Renaming the owned value can improve readability by making the ownership transfer explicit and avoiding shadowing.

Suggested change
let data = if let Some(ref_id) = params.get("source_tool_call_id").and_then(|v| v.as_str())
{
let stash = ctx.tool_output_stash.read().await;
let full_output = stash.get(ref_id).ok_or_else(|| {
ToolError::InvalidParameters(format!(
"no tool output found for call ID '{}'. Available IDs: {:?}",
ref_id,
stash.keys().collect::<Vec<_>>()
))
})?;
// Parse the stashed output as JSON, or wrap as string
serde_json::from_str::<serde_json::Value>(full_output)
.unwrap_or_else(|_| serde_json::Value::String(full_output.clone()))
} else {
require_param(&params, "data")?.clone()
};
let data = &data;
let data_value = if let Some(ref_id) = params.get("source_tool_call_id").and_then(|v| v.as_str())
{
let stash = ctx.tool_output_stash.read().await;
let full_output = stash.get(ref_id).ok_or_else(|| {
ToolError::InvalidParameters(format!(
"no tool output found for call ID '{}'. Available IDs: {:?}",
ref_id,
stash.keys().collect::<Vec<_>>()
))
})?;
// Parse the stashed output as JSON, or wrap as string
serde_json::from_str::<serde_json::Value>(full_output)
.unwrap_or_else(|_| serde_json::Value::String(full_output.clone()))
} else {
require_param(&params, "data")?.clone()
};
let data = &data_value;

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added size: XL 500+ changed lines and removed size: L 200-499 changed lines labels Mar 5, 2026
ilblackdragon and others added 2 commits March 5, 2026 11:46
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address PR review: rename owned `data` to `data_value` before
re-binding as `let data = &data_value` to make ownership explicit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ilblackdragon
Copy link
Copy Markdown
Member Author

Addressed review feedback:

  • Style nit (data shadowing): Fixed in c485de5 — renamed owned binding to data_value to avoid shadowing confusion.

  • Security concern (truncated outputs bypassing leak detection): This is a false positive. The stash stores full output for internal tool-to-tool data flow only — the LLM never sees stashed data directly. The LLM receives the sanitized/truncated version (which passes through leak detection). When the json tool reads from the stash and produces a result, that result goes through normal sanitization before reaching the LLM.

ilblackdragon and others added 3 commits March 5, 2026 12:18
The weather_sf and baseball_stats tests hit live external APIs (wttr.in,
ESPN) which are unreliable in CI. Mark them #[ignore] so they don't
block the pipeline. Run locally with `--ignored` to include them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… live APIs

Wire ReplayingHttpInterceptor into TestRig when the trace fixture
contains http_exchanges. This replays recorded responses instead of
making live network calls, making tests deterministic and CI-stable.

Add captured HTTP responses to weather_sf.json (wttr.in) and
baseball_stats.json (ESPN API) fixtures.

Revert #[ignore] on both tests — they now run offline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When flatten_tool_messages converts tool calls to text like
`[Called tool `http` with arguments: {...}]` for NEAR AI compatibility,
the LLM sometimes echoes this format back in its text responses instead
of using proper tool_calls. Add recovery for this bracket format in
recover_tool_calls_from_content and strip it in clean_response so
users don't see raw tool call syntax.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@ilblackdragon ilblackdragon requested a review from Copilot March 6, 2026 00:45
@ilblackdragon ilblackdragon merged commit 470de5b into main Mar 6, 2026
16 checks passed
@ilblackdragon ilblackdragon deleted the feat/merge-http-tools-stash branch March 6, 2026 00:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/context/state.rs
/// Tool outputs may be truncated before reaching the LLM context window,
/// but subsequent tools (e.g., `json`) may need the full output. This
/// stash stores the complete, unsanitized output so tools can reference
/// previous results by ID via `$tool_call_id` parameter syntax.
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool_output_stash HashMap grows unboundedly — every tool call's full output is stored for the entire duration of the agent session with no eviction or size cap. In long-running agentic sessions that make many HTTP requests returning large responses (the max is 5 MB per response), this could accumulate significant memory. Consider capping stash size (e.g., only store the most recent N outputs or entries whose serialized output exceeds the truncation threshold), or documenting this as a known trade-off.

Suggested change
/// previous results by ID via `$tool_call_id` parameter syntax.
/// previous results by ID via `$tool_call_id` parameter syntax.
///
/// NOTE: This map is intentionally unbounded and can grow for the lifetime
/// of a job. In long-running sessions that make many tool calls returning
/// large responses, this may consume significant memory. Callers and job
/// runtimes should ensure that the stash is pruned or cleared periodically
/// in such scenarios, or avoid using unbounded jobs with very large tool
/// outputs if memory usage is a concern.

Copilot uses AI. Check for mistakes.
Comment thread src/llm/reasoning.rs
Comment on lines +1236 to +1245
let mut result = String::with_capacity(text.len());
let mut remaining = text;
while let Some(start) = remaining.find("[Called tool `") {
result.push_str(&remaining[..start]);
let after = &remaining[start..];
// Find the closing "]" for this bracket expression
if let Some(end) = after.find("]\n").map(|i| i + 2).or_else(|| {
// If it's at the end of the string, just find "]"
after.rfind(']').map(|i| i + 1)
}) {
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strip_bracket_tool_calls function falls back to after.rfind(']') when no "]\n" is found. after is sliced from the start of the current [Called tool pattern to the end of the entire remaining string, so rfind finds the very last ] in the text rather than the closing bracket of the current expression. When multiple [Called tool ...] patterns appear on the same line (e.g., without a trailing newline separating them), the first match will greedily consume everything through the end of the last expression, causing intermediate text to be dropped or subsequent patterns to be missed.

Consider a case like: "[Called tool httpwith arguments: {}] some text [Called tooljson with arguments: {}]"

  • after.find("]\n") — no match (no trailing \n)
  • after.rfind(']') — returns the index of the last ] at the very end
  • Result: both tool calls and the text between them are stripped as a single match

A safer approach would be to track brace depth to find the JSON-closing } and then the immediately following ].

Suggested change
let mut result = String::with_capacity(text.len());
let mut remaining = text;
while let Some(start) = remaining.find("[Called tool `") {
result.push_str(&remaining[..start]);
let after = &remaining[start..];
// Find the closing "]" for this bracket expression
if let Some(end) = after.find("]\n").map(|i| i + 2).or_else(|| {
// If it's at the end of the string, just find "]"
after.rfind(']').map(|i| i + 1)
}) {
/// Find the end of a single `[Called tool ...]` expression starting at `after`.
///
/// We locate the first `{`, track brace depth until the matching `}`, then
/// return the index just after the following `]` (and optional trailing '\n').
fn find_tool_call_end(after: &str) -> Option<usize> {
// Locate the start of the JSON arguments block.
let brace_start = after.find('{')?;
let mut depth: i32 = 0;
let mut json_end: Option<usize> = None;
// Walk from the first '{' to find the matching '}'.
for (rel_idx, ch) in after[brace_start..].char_indices() {
match ch {
'{' => depth += 1,
'}' => {
depth -= 1;
if depth == 0 {
// Position just after this closing '}'.
json_end = Some(brace_start + rel_idx + ch.len_utf8());
break;
}
}
_ => {}
}
}
let json_end = json_end?;
// From the end of the JSON block, find the next closing ']'.
let rel_bracket = after[json_end..].find(']')?;
let mut end = json_end + rel_bracket + 1; // position just after ']'
// Optionally consume a single trailing newline.
if after.as_bytes().get(end) == Some(&b'\n') {
end += 1;
}
Some(end)
}
let mut result = String::with_capacity(text.len());
let mut remaining = text;
while let Some(start) = remaining.find("[Called tool `") {
result.push_str(&remaining[..start]);
let after = &remaining[start..];
// Find the closing "]" for this bracket expression in a brace-aware way.
if let Some(end) = find_tool_call_end(after) {

Copilot uses AI. Check for mistakes.
Comment thread src/context/state.rs
/// Tool outputs may be truncated before reaching the LLM context window,
/// but subsequent tools (e.g., `json`) may need the full output. This
/// stash stores the complete, unsanitized output so tools can reference
/// previous results by ID via `$tool_call_id` parameter syntax.
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc-comment for tool_output_stash says "previous results by ID via $tool_call_id parameter syntax", but the actual parameter name used to reference stashed outputs is source_tool_call_id (not $tool_call_id). The comment is misleading and refers to a syntax that doesn't exist.

Suggested change
/// previous results by ID via `$tool_call_id` parameter syntax.
/// previous results by ID via the `source_tool_call_id` parameter.

Copilot uses AI. Check for mistakes.
Comment thread src/llm/reasoning.rs
Comment on lines +1157 to +1168
if let Some(bracket_end) = args_start.rfind(']') {
let args_str = &args_start[..bracket_end];
let arguments = serde_json::from_str::<serde_json::Value>(args_str)
.unwrap_or(serde_json::Value::Object(Default::default()));
calls.push(ToolCall {
id: format!("recovered_{}", calls.len()),
name: name.to_string(),
arguments,
});
remaining = &args_start[bracket_end + 1..];
continue;
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In recover_tool_calls_from_content, args_start.rfind(']') searches for the last ] in args_start, which extends to the end of remaining (the entire rest of the content). When multiple bracket-format tool calls appear in the same content string, the rfind on the first match will find the last ] from a subsequent match, causing incorrect JSON parsing for the first tool call's arguments.

For example: "[Called tool httpwith arguments: {\"a\":1}] and [Called tooljson with arguments: {\"b\":2}]" — the first rfind(']') would return the position of the last ], making the first call's args {\"a\":1}] and [Called tool \json` with arguments: {"b":2}, which is invalid JSON and would silently become empty args. Then remainingwould advance past that last]`, skipping the second tool call entirely.

Copilot uses AI. Check for mistakes.
logicminds pushed a commit to logicminds/ironclaw that referenced this pull request Mar 6, 2026
…ponses (nearai#578)

* feat: merge http/web_fetch tools, add tool output stash for large responses

Merge `web_fetch` into `http` tool with smart approval: plain GETs (no
headers, no body) run without approval and follow redirects with SSRF
re-validation per hop; all other requests require approval as before.

Add `tool_output_stash` on JobContext so full tool outputs are preserved
before safety-layer truncation. The `json` tool gains a
`source_tool_call_id` parameter to reference stashed outputs, enabling
reliable parsing of large API responses that exceed the 100KB context
limit.

Other improvements:
- Descriptive User-Agent header using CARGO_PKG_VERSION
- Truncation now keeps partial data + hint about source_tool_call_id
- System prompt reinforces tool_calls over narration
- json tool query/stringify handle pre-parsed (non-string) data

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: delete dead web_fetch.rs (merged into http tool)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix rustfmt formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: rename shadowed data binding for clarity in json tool

Address PR review: rename owned `data` to `data_value` before
re-binding as `let data = &data_value` to make ownership explicit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): mark network-dependent trace tests as #[ignore]

The weather_sf and baseball_stats tests hit live external APIs (wttr.in,
ESPN) which are unreliable in CI. Mark them #[ignore] so they don't
block the pipeline. Run locally with `--ignored` to include them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replay recorded HTTP exchanges in trace tests instead of hitting live APIs

Wire ReplayingHttpInterceptor into TestRig when the trace fixture
contains http_exchanges. This replays recorded responses instead of
making live network calls, making tests deterministic and CI-stable.

Add captured HTTP responses to weather_sf.json (wttr.in) and
baseball_stats.json (ESPN API) fixtures.

Revert #[ignore] on both tests — they now run offline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: recover inline bracket-format tool calls from LLM text responses

When flatten_tool_messages converts tool calls to text like
`[Called tool `http` with arguments: {...}]` for NEAR AI compatibility,
the LLM sometimes echoes this format back in its text responses instead
of using proper tool_calls. Add recovery for this bracket format in
recover_tool_calls_from_content and strip it in clean_response so
users don't see raw tool call syntax.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot mentioned this pull request Mar 6, 2026
@github-actions github-actions bot mentioned this pull request Mar 6, 2026
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
…ponses (nearai#578)

* feat: merge http/web_fetch tools, add tool output stash for large responses

Merge `web_fetch` into `http` tool with smart approval: plain GETs (no
headers, no body) run without approval and follow redirects with SSRF
re-validation per hop; all other requests require approval as before.

Add `tool_output_stash` on JobContext so full tool outputs are preserved
before safety-layer truncation. The `json` tool gains a
`source_tool_call_id` parameter to reference stashed outputs, enabling
reliable parsing of large API responses that exceed the 100KB context
limit.

Other improvements:
- Descriptive User-Agent header using CARGO_PKG_VERSION
- Truncation now keeps partial data + hint about source_tool_call_id
- System prompt reinforces tool_calls over narration
- json tool query/stringify handle pre-parsed (non-string) data

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: delete dead web_fetch.rs (merged into http tool)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix rustfmt formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: rename shadowed data binding for clarity in json tool

Address PR review: rename owned `data` to `data_value` before
re-binding as `let data = &data_value` to make ownership explicit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): mark network-dependent trace tests as #[ignore]

The weather_sf and baseball_stats tests hit live external APIs (wttr.in,
ESPN) which are unreliable in CI. Mark them #[ignore] so they don't
block the pipeline. Run locally with `--ignored` to include them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replay recorded HTTP exchanges in trace tests instead of hitting live APIs

Wire ReplayingHttpInterceptor into TestRig when the trace fixture
contains http_exchanges. This replays recorded responses instead of
making live network calls, making tests deterministic and CI-stable.

Add captured HTTP responses to weather_sf.json (wttr.in) and
baseball_stats.json (ESPN API) fixtures.

Revert #[ignore] on both tests — they now run offline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: recover inline bracket-format tool calls from LLM text responses

When flatten_tool_messages converts tool calls to text like
`[Called tool `http` with arguments: {...}]` for NEAR AI compatibility,
the LLM sometimes echoes this format back in its text responses instead
of using proper tool_calls. Add recovery for this bracket format in
recover_tool_calls_from_content and strip it in clean_response so
users don't see raw tool call syntax.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: high Safety, secrets, auth, or critical infrastructure scope: agent Agent core (agent loop, router, scheduler) scope: llm LLM integration scope: safety Prompt injection defense scope: tool/builtin Built-in tools scope: tool Tool infrastructure size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants