feat: merge http/web_fetch tools, add tool output stash for large responses#578
feat: merge http/web_fetch tools, add tool output stash for large responses#578ilblackdragon merged 7 commits intomainfrom
Conversation
…ponses Merge `web_fetch` into `http` tool with smart approval: plain GETs (no headers, no body) run without approval and follow redirects with SSRF re-validation per hop; all other requests require approval as before. Add `tool_output_stash` on JobContext so full tool outputs are preserved before safety-layer truncation. The `json` tool gains a `source_tool_call_id` parameter to reference stashed outputs, enabling reliable parsing of large API responses that exceed the 100KB context limit. Other improvements: - Descriptive User-Agent header using CARGO_PKG_VERSION - Truncation now keeps partial data + hint about source_tool_call_id - System prompt reinforces tool_calls over narration - json tool query/stringify handle pre-parsed (non-string) data [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the agent's ability to interact with external HTTP services and process large data payloads. By unifying HTTP functionality and introducing a mechanism to stash full tool outputs, the system can now handle complex API interactions more robustly and efficiently. The improvements to the JSON tool and output truncation ensure that critical information is not lost, leading to more reliable and accurate agent responses, particularly when dealing with extensive data from web services. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly improves tool handling by merging web_fetch and http tools, and introducing a tool_output_stash for large responses. The new http tool features intelligent approval handling and secure redirect following with per-hop SSRF validation, while the json tool is enhanced to process large outputs using the stash. However, a critical security flaw exists where truncated tool outputs bypass leak detection and sanitization, risking sensitive information leakage or prompt injection attacks, which violates the rule that sanitization should be applied to data paths sent to external services like an LLM. Additionally, there are suggestions to improve the json tool's schema definition and code clarity.
| let data = if let Some(ref_id) = params.get("source_tool_call_id").and_then(|v| v.as_str()) | ||
| { | ||
| let stash = ctx.tool_output_stash.read().await; | ||
| let full_output = stash.get(ref_id).ok_or_else(|| { | ||
| ToolError::InvalidParameters(format!( | ||
| "no tool output found for call ID '{}'. Available IDs: {:?}", | ||
| ref_id, | ||
| stash.keys().collect::<Vec<_>>() | ||
| )) | ||
| })?; | ||
| // Parse the stashed output as JSON, or wrap as string | ||
| serde_json::from_str::<serde_json::Value>(full_output) | ||
| .unwrap_or_else(|_| serde_json::Value::String(full_output.clone())) | ||
| } else { | ||
| require_param(¶ms, "data")?.clone() | ||
| }; | ||
| let data = &data; |
There was a problem hiding this comment.
The variable data is assigned an owned serde_json::Value and then immediately re-shadowed as a reference (let data = &data;). This pattern can be slightly confusing to read. Renaming the owned value can improve readability by making the ownership transfer explicit and avoiding shadowing.
| let data = if let Some(ref_id) = params.get("source_tool_call_id").and_then(|v| v.as_str()) | |
| { | |
| let stash = ctx.tool_output_stash.read().await; | |
| let full_output = stash.get(ref_id).ok_or_else(|| { | |
| ToolError::InvalidParameters(format!( | |
| "no tool output found for call ID '{}'. Available IDs: {:?}", | |
| ref_id, | |
| stash.keys().collect::<Vec<_>>() | |
| )) | |
| })?; | |
| // Parse the stashed output as JSON, or wrap as string | |
| serde_json::from_str::<serde_json::Value>(full_output) | |
| .unwrap_or_else(|_| serde_json::Value::String(full_output.clone())) | |
| } else { | |
| require_param(¶ms, "data")?.clone() | |
| }; | |
| let data = &data; | |
| let data_value = if let Some(ref_id) = params.get("source_tool_call_id").and_then(|v| v.as_str()) | |
| { | |
| let stash = ctx.tool_output_stash.read().await; | |
| let full_output = stash.get(ref_id).ok_or_else(|| { | |
| ToolError::InvalidParameters(format!( | |
| "no tool output found for call ID '{}'. Available IDs: {:?}", | |
| ref_id, | |
| stash.keys().collect::<Vec<_>>() | |
| )) | |
| })?; | |
| // Parse the stashed output as JSON, or wrap as string | |
| serde_json::from_str::<serde_json::Value>(full_output) | |
| .unwrap_or_else(|_| serde_json::Value::String(full_output.clone())) | |
| } else { | |
| require_param(¶ms, "data")?.clone() | |
| }; | |
| let data = &data_value; |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address PR review: rename owned `data` to `data_value` before re-binding as `let data = &data_value` to make ownership explicit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Addressed review feedback:
|
The weather_sf and baseball_stats tests hit live external APIs (wttr.in, ESPN) which are unreliable in CI. Mark them #[ignore] so they don't block the pipeline. Run locally with `--ignored` to include them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… live APIs Wire ReplayingHttpInterceptor into TestRig when the trace fixture contains http_exchanges. This replays recorded responses instead of making live network calls, making tests deterministic and CI-stable. Add captured HTTP responses to weather_sf.json (wttr.in) and baseball_stats.json (ESPN API) fixtures. Revert #[ignore] on both tests — they now run offline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When flatten_tool_messages converts tool calls to text like
`[Called tool `http` with arguments: {...}]` for NEAR AI compatibility,
the LLM sometimes echoes this format back in its text responses instead
of using proper tool_calls. Add recovery for this bracket format in
recover_tool_calls_from_content and strip it in clean_response so
users don't see raw tool call syntax.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Tool outputs may be truncated before reaching the LLM context window, | ||
| /// but subsequent tools (e.g., `json`) may need the full output. This | ||
| /// stash stores the complete, unsanitized output so tools can reference | ||
| /// previous results by ID via `$tool_call_id` parameter syntax. |
There was a problem hiding this comment.
The tool_output_stash HashMap grows unboundedly — every tool call's full output is stored for the entire duration of the agent session with no eviction or size cap. In long-running agentic sessions that make many HTTP requests returning large responses (the max is 5 MB per response), this could accumulate significant memory. Consider capping stash size (e.g., only store the most recent N outputs or entries whose serialized output exceeds the truncation threshold), or documenting this as a known trade-off.
| /// previous results by ID via `$tool_call_id` parameter syntax. | |
| /// previous results by ID via `$tool_call_id` parameter syntax. | |
| /// | |
| /// NOTE: This map is intentionally unbounded and can grow for the lifetime | |
| /// of a job. In long-running sessions that make many tool calls returning | |
| /// large responses, this may consume significant memory. Callers and job | |
| /// runtimes should ensure that the stash is pruned or cleared periodically | |
| /// in such scenarios, or avoid using unbounded jobs with very large tool | |
| /// outputs if memory usage is a concern. |
| let mut result = String::with_capacity(text.len()); | ||
| let mut remaining = text; | ||
| while let Some(start) = remaining.find("[Called tool `") { | ||
| result.push_str(&remaining[..start]); | ||
| let after = &remaining[start..]; | ||
| // Find the closing "]" for this bracket expression | ||
| if let Some(end) = after.find("]\n").map(|i| i + 2).or_else(|| { | ||
| // If it's at the end of the string, just find "]" | ||
| after.rfind(']').map(|i| i + 1) | ||
| }) { |
There was a problem hiding this comment.
The strip_bracket_tool_calls function falls back to after.rfind(']') when no "]\n" is found. after is sliced from the start of the current [Called tool pattern to the end of the entire remaining string, so rfind finds the very last ] in the text rather than the closing bracket of the current expression. When multiple [Called tool ...] patterns appear on the same line (e.g., without a trailing newline separating them), the first match will greedily consume everything through the end of the last expression, causing intermediate text to be dropped or subsequent patterns to be missed.
Consider a case like: "[Called tool httpwith arguments: {}] some text [Called tooljson with arguments: {}]"
after.find("]\n")— no match (no trailing\n)after.rfind(']')— returns the index of the last]at the very end- Result: both tool calls and the text between them are stripped as a single match
A safer approach would be to track brace depth to find the JSON-closing } and then the immediately following ].
| let mut result = String::with_capacity(text.len()); | |
| let mut remaining = text; | |
| while let Some(start) = remaining.find("[Called tool `") { | |
| result.push_str(&remaining[..start]); | |
| let after = &remaining[start..]; | |
| // Find the closing "]" for this bracket expression | |
| if let Some(end) = after.find("]\n").map(|i| i + 2).or_else(|| { | |
| // If it's at the end of the string, just find "]" | |
| after.rfind(']').map(|i| i + 1) | |
| }) { | |
| /// Find the end of a single `[Called tool ...]` expression starting at `after`. | |
| /// | |
| /// We locate the first `{`, track brace depth until the matching `}`, then | |
| /// return the index just after the following `]` (and optional trailing '\n'). | |
| fn find_tool_call_end(after: &str) -> Option<usize> { | |
| // Locate the start of the JSON arguments block. | |
| let brace_start = after.find('{')?; | |
| let mut depth: i32 = 0; | |
| let mut json_end: Option<usize> = None; | |
| // Walk from the first '{' to find the matching '}'. | |
| for (rel_idx, ch) in after[brace_start..].char_indices() { | |
| match ch { | |
| '{' => depth += 1, | |
| '}' => { | |
| depth -= 1; | |
| if depth == 0 { | |
| // Position just after this closing '}'. | |
| json_end = Some(brace_start + rel_idx + ch.len_utf8()); | |
| break; | |
| } | |
| } | |
| _ => {} | |
| } | |
| } | |
| let json_end = json_end?; | |
| // From the end of the JSON block, find the next closing ']'. | |
| let rel_bracket = after[json_end..].find(']')?; | |
| let mut end = json_end + rel_bracket + 1; // position just after ']' | |
| // Optionally consume a single trailing newline. | |
| if after.as_bytes().get(end) == Some(&b'\n') { | |
| end += 1; | |
| } | |
| Some(end) | |
| } | |
| let mut result = String::with_capacity(text.len()); | |
| let mut remaining = text; | |
| while let Some(start) = remaining.find("[Called tool `") { | |
| result.push_str(&remaining[..start]); | |
| let after = &remaining[start..]; | |
| // Find the closing "]" for this bracket expression in a brace-aware way. | |
| if let Some(end) = find_tool_call_end(after) { |
| /// Tool outputs may be truncated before reaching the LLM context window, | ||
| /// but subsequent tools (e.g., `json`) may need the full output. This | ||
| /// stash stores the complete, unsanitized output so tools can reference | ||
| /// previous results by ID via `$tool_call_id` parameter syntax. |
There was a problem hiding this comment.
The doc-comment for tool_output_stash says "previous results by ID via $tool_call_id parameter syntax", but the actual parameter name used to reference stashed outputs is source_tool_call_id (not $tool_call_id). The comment is misleading and refers to a syntax that doesn't exist.
| /// previous results by ID via `$tool_call_id` parameter syntax. | |
| /// previous results by ID via the `source_tool_call_id` parameter. |
| if let Some(bracket_end) = args_start.rfind(']') { | ||
| let args_str = &args_start[..bracket_end]; | ||
| let arguments = serde_json::from_str::<serde_json::Value>(args_str) | ||
| .unwrap_or(serde_json::Value::Object(Default::default())); | ||
| calls.push(ToolCall { | ||
| id: format!("recovered_{}", calls.len()), | ||
| name: name.to_string(), | ||
| arguments, | ||
| }); | ||
| remaining = &args_start[bracket_end + 1..]; | ||
| continue; | ||
| } |
There was a problem hiding this comment.
In recover_tool_calls_from_content, args_start.rfind(']') searches for the last ] in args_start, which extends to the end of remaining (the entire rest of the content). When multiple bracket-format tool calls appear in the same content string, the rfind on the first match will find the last ] from a subsequent match, causing incorrect JSON parsing for the first tool call's arguments.
For example: "[Called tool httpwith arguments: {\"a\":1}] and [Called tooljson with arguments: {\"b\":2}]" — the first rfind(']') would return the position of the last ], making the first call's args {\"a\":1}] and [Called tool \json` with arguments: {"b":2}, which is invalid JSON and would silently become empty args. Then remainingwould advance past that last]`, skipping the second tool call entirely.
…ponses (nearai#578) * feat: merge http/web_fetch tools, add tool output stash for large responses Merge `web_fetch` into `http` tool with smart approval: plain GETs (no headers, no body) run without approval and follow redirects with SSRF re-validation per hop; all other requests require approval as before. Add `tool_output_stash` on JobContext so full tool outputs are preserved before safety-layer truncation. The `json` tool gains a `source_tool_call_id` parameter to reference stashed outputs, enabling reliable parsing of large API responses that exceed the 100KB context limit. Other improvements: - Descriptive User-Agent header using CARGO_PKG_VERSION - Truncation now keeps partial data + hint about source_tool_call_id - System prompt reinforces tool_calls over narration - json tool query/stringify handle pre-parsed (non-string) data [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: delete dead web_fetch.rs (merged into http tool) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix rustfmt formatting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: rename shadowed data binding for clarity in json tool Address PR review: rename owned `data` to `data_value` before re-binding as `let data = &data_value` to make ownership explicit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): mark network-dependent trace tests as #[ignore] The weather_sf and baseball_stats tests hit live external APIs (wttr.in, ESPN) which are unreliable in CI. Mark them #[ignore] so they don't block the pipeline. Run locally with `--ignored` to include them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replay recorded HTTP exchanges in trace tests instead of hitting live APIs Wire ReplayingHttpInterceptor into TestRig when the trace fixture contains http_exchanges. This replays recorded responses instead of making live network calls, making tests deterministic and CI-stable. Add captured HTTP responses to weather_sf.json (wttr.in) and baseball_stats.json (ESPN API) fixtures. Revert #[ignore] on both tests — they now run offline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: recover inline bracket-format tool calls from LLM text responses When flatten_tool_messages converts tool calls to text like `[Called tool `http` with arguments: {...}]` for NEAR AI compatibility, the LLM sometimes echoes this format back in its text responses instead of using proper tool_calls. Add recovery for this bracket format in recover_tool_calls_from_content and strip it in clean_response so users don't see raw tool call syntax. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ponses (nearai#578) * feat: merge http/web_fetch tools, add tool output stash for large responses Merge `web_fetch` into `http` tool with smart approval: plain GETs (no headers, no body) run without approval and follow redirects with SSRF re-validation per hop; all other requests require approval as before. Add `tool_output_stash` on JobContext so full tool outputs are preserved before safety-layer truncation. The `json` tool gains a `source_tool_call_id` parameter to reference stashed outputs, enabling reliable parsing of large API responses that exceed the 100KB context limit. Other improvements: - Descriptive User-Agent header using CARGO_PKG_VERSION - Truncation now keeps partial data + hint about source_tool_call_id - System prompt reinforces tool_calls over narration - json tool query/stringify handle pre-parsed (non-string) data [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: delete dead web_fetch.rs (merged into http tool) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix rustfmt formatting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: rename shadowed data binding for clarity in json tool Address PR review: rename owned `data` to `data_value` before re-binding as `let data = &data_value` to make ownership explicit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): mark network-dependent trace tests as #[ignore] The weather_sf and baseball_stats tests hit live external APIs (wttr.in, ESPN) which are unreliable in CI. Mark them #[ignore] so they don't block the pipeline. Run locally with `--ignored` to include them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replay recorded HTTP exchanges in trace tests instead of hitting live APIs Wire ReplayingHttpInterceptor into TestRig when the trace fixture contains http_exchanges. This replays recorded responses instead of making live network calls, making tests deterministic and CI-stable. Add captured HTTP responses to weather_sf.json (wttr.in) and baseball_stats.json (ESPN API) fixtures. Revert #[ignore] on both tests — they now run offline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: recover inline bracket-format tool calls from LLM text responses When flatten_tool_messages converts tool calls to text like `[Called tool `http` with arguments: {...}]` for NEAR AI compatibility, the LLM sometimes echoes this format back in its text responses instead of using proper tool_calls. Add recovery for this bracket format in recover_tool_calls_from_content and strip it in clean_response so users don't see raw tool call syntax. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
web_fetchintohttptool with smart approval: plain GETs (no auth headers, no body) need no approval and follow redirects with SSRF re-validation per hop; everything else requires approval as beforetool_output_stashonJobContextto preserve full tool outputs before safety-layer truncation. Thejsontool gainssource_tool_call_idto reference stashed outputs, enabling reliable parsing of large API responses (>100KB)jsontoolquery/stringifyoperations to handle pre-parsed JSON data from the stash (not just string inputs)User-Agentheader usingCARGO_PKG_VERSIONso public APIs don't reject requestssource_tool_call_idinstead of replacing everything with an error messagetool_callsover narrating intentTest plan
recorded_baseball_stats— E2E trace test: large ESPN API response → truncated in context →jsontool queries full output viasource_tool_call_idrecorded_weather_sf— E2E trace test: simple GET to weather APItest_query_with_object_data_from_stash— regression test for json query with pre-parsed datatest_stringify_with_object_data_from_stash— regression test for json stringify with pre-parsed datatest_plain_get_returns_never— approval requirement for simple GETstest_post_returns_unless_auto_approved— approval requirement for POSTtest_get_with_headers_returns_unless_auto_approved— approval requirement for GET with headers🤖 Generated with Claude Code