enforces SSRF checks on the actual browser request chain#272
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds per-navigation request interception and hop-by-hop SSRF validation to the screenshot tool: seeds per-navigation DNS allow-cache, validates each redirect target (scheme and private-host checks), enforces redirect hop limits and loop detection, adjusts timeouts, and cleans up interception state. Includes unit tests for new validators. Changes
Sequence DiagramsequenceDiagram
participant User
participant ScreenshotTool
participant Browser
participant Interceptor as Network Interceptor
participant Validator as URL Validator
participant DNS as DNS Checker
User->>ScreenshotTool: Request screenshot(target URL)
ScreenshotTool->>DNS: Preflight DNS check for entry host (seed allow-cache)
DNS-->>ScreenshotTool: Allowed / Blocked
ScreenshotTool->>Browser: Create empty page
ScreenshotTool->>Interceptor: Enable request interception
ScreenshotTool->>Browser: Navigate to target URL
loop for each paused request / redirect hop
Browser->>Interceptor: EventRequestPaused
Interceptor->>Validator: validate_browser_request_url(paused request)
Validator->>DNS: Resolve/check host (per-navigation allow-cache)
DNS-->>Validator: Safe / Blocked
alt safe
Validator-->>Interceptor: Resume/ContinueRequest
Interceptor->>Browser: ContinueRequest/ResumeRequest
else blocked or non-http(s)
Validator-->>Interceptor: FailRequest
Interceptor->>Browser: FailRequest (AccessDenied)
end
end
Browser->>ScreenshotTool: Navigation finished / page ready
ScreenshotTool->>Browser: Capture screenshot
ScreenshotTool->>Interceptor: Disable interception (cleanup)
ScreenshotTool-->>User: Return screenshot
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/tools/screenshot.rs (1)
232-234: Consider pre-populating the DNS cache with the initial entry URL check.The entry URL undergoes DNS resolution here (line 234), but the
dns_allow_cacheis created later (line 327) inside the timeout block. This means the first browser request will re-resolve DNS for the entry URL.This is minor since it's defense-in-depth and the early check provides fast-fail before browser launch. However, if you want to avoid the redundant lookup:
♻️ Optional: Pre-populate cache to skip redundant DNS lookup
// Initial DNS SSRF check for the entry URL. resolve_and_check_host(&parsed).await?; + + // Pre-populate cache with the entry host to avoid redundant lookup. + let mut dns_allow_cache: HashMap<String, ()> = HashMap::new(); + if let Some(host) = parsed.host_str() { + dns_allow_cache.insert(host.to_ascii_lowercase(), ()); + } // ---- Parse optional parameters ----Then remove line 327 and pass the pre-populated cache into the timeout block.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tools/screenshot.rs` around lines 232 - 234, The initial DNS resolution via resolve_and_check_host(&parsed).await? is performed before dns_allow_cache is created, causing a redundant DNS lookup later; to fix, perform the DNS cache population immediately after resolve_and_check_host by creating and inserting the entry URL's resolved result into dns_allow_cache (create a dns_allow_cache instance there), then remove the dns_allow_cache instantiation inside the timeout block and instead pass the pre-populated dns_allow_cache into the timeout-handling code so subsequent browser requests reuse the cached resolution.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/tools/screenshot.rs`:
- Around line 232-234: The initial DNS resolution via
resolve_and_check_host(&parsed).await? is performed before dns_allow_cache is
created, causing a redundant DNS lookup later; to fix, perform the DNS cache
population immediately after resolve_and_check_host by creating and inserting
the entry URL's resolved result into dns_allow_cache (create a dns_allow_cache
instance there), then remove the dns_allow_cache instantiation inside the
timeout block and instead pass the pre-populated dns_allow_cache into the
timeout-handling code so subsequent browser requests reuse the cached
resolution.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/tools/screenshot.rs (2)
89-110: LGTM!The function correctly parses, validates, and caches DNS lookups. The
Vacantentry pattern is idiomatic for cache-miss handling.Minor style note:
HashSet<String>would be slightly more expressive thanHashMap<String, ()>, but this is purely cosmetic.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tools/screenshot.rs` around lines 89 - 110, The parameter dns_allow_cache in validate_browser_request_url is using HashMap<String, ()> where a HashSet<String> is more expressive; change the function signature to accept a &mut HashSet<String>, replace the Entry::Vacant check with a contains/insert flow (e.g., if !dns_allow_set.contains(&host) { resolve_and_check_host(&parsed).await?; dns_allow_set.insert(host); }), and update all callers to pass a HashSet<String> instead of HashMap<String, ()> so caching behavior remains identical but the type better conveys intent.
334-349: Consider adding a redirect hop limit to prevent excessive redirect chains.The PR objectives specifically mention: "Enforce redirect limits and detect redirect loops or excessive hops with clear errors." While each redirect hop is correctly validated for SSRF, there's no counter to prevent an attacker from creating a long chain of valid redirects (e.g., 100+ hops to public hosts) that could cause resource exhaustion or timing issues.
Consider tracking hop count and failing after a reasonable limit (e.g., 20 hops, consistent with typical HTTP client defaults):
♻️ Proposed enhancement
+ const MAX_REDIRECT_HOPS: usize = 20; + let mut redirect_count: usize = 0; + let capture_result = loop { tokio::select! { captured = &mut screenshot_future => { break captured; } maybe_event = paused_events.next() => { let Some(event) = maybe_event else { break Err(ZeptoError::Tool( "Browser request interception stream ended unexpectedly".to_string() )); }; + redirect_count += 1; + if redirect_count > MAX_REDIRECT_HOPS { + let _ = page.execute(FailRequestParams::new( + event.request_id.clone(), + ErrorReason::AccessDenied, + )).await; + break Err(ZeptoError::SecurityViolation( + format!("Exceeded maximum redirect hops ({})", MAX_REDIRECT_HOPS) + )); + } + handle_paused_browser_request(&page, event.as_ref(), &mut dns_allow_cache).await?; } } };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tools/screenshot.rs` around lines 334 - 349, The redirect handling loop around capture_result/screenshot_future should enforce a hop counter to prevent excessive redirect chains: add an integer hop_count (start 0) and increment it each time you handle a redirect via handle_paused_browser_request (or wherever a redirect response is followed), and if hop_count exceeds a defined constant (e.g., MAX_REDIRECT_HOPS = 20) return Err(ZeptoError::Tool("Too many redirects"…)) from the loop; update the logic that breaks with captured to also check/propagate this error, and ensure dns_allow_cache, paused_events, and screenshot_future flow are unchanged except for the added count check so you detect redirect loops/excessive hops while preserving SSRF validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/tools/screenshot.rs`:
- Around line 89-110: The parameter dns_allow_cache in
validate_browser_request_url is using HashMap<String, ()> where a
HashSet<String> is more expressive; change the function signature to accept a
&mut HashSet<String>, replace the Entry::Vacant check with a contains/insert
flow (e.g., if !dns_allow_set.contains(&host) {
resolve_and_check_host(&parsed).await?; dns_allow_set.insert(host); }), and
update all callers to pass a HashSet<String> instead of HashMap<String, ()> so
caching behavior remains identical but the type better conveys intent.
- Around line 334-349: The redirect handling loop around
capture_result/screenshot_future should enforce a hop counter to prevent
excessive redirect chains: add an integer hop_count (start 0) and increment it
each time you handle a redirect via handle_paused_browser_request (or wherever a
redirect response is followed), and if hop_count exceeds a defined constant
(e.g., MAX_REDIRECT_HOPS = 20) return Err(ZeptoError::Tool("Too many
redirects"…)) from the loop; update the logic that breaks with captured to also
check/propagate this error, and ensure dns_allow_cache, paused_events, and
screenshot_future flow are unchanged except for the added count check so you
detect redirect loops/excessive hops while preserving SSRF validation.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/tools/screenshot.rs (1)
353-385:⚠️ Potential issue | 🟠 MajorEnsure interception/browser cleanup runs on all error paths.
Current
?/??flow can return beforeFetchDisable,drop(browser), andhandler_handle.abort(), which risks leaving background handler/browser work running longer than intended.🔧 Suggested control-flow fix
- let capture_result = loop { + let capture_result = loop { tokio::select! { @@ - handle_paused_browser_request(&page, event.as_ref(), &mut dns_allow_cache).await?; + if let Err(err) = handle_paused_browser_request(&page, event.as_ref(), &mut dns_allow_cache).await { + break Err(err); + } } } }; @@ - .await - .map_err(|_| { + .await; + + // Always cleanup browser resources, even on timeout/error. + drop(browser); + handler_handle.abort(); + + let screenshot_result = screenshot_result.map_err(|_| { ZeptoError::Tool(format!( "Screenshot timed out after {}s for '{}'", timeout_secs, url_str )) - })??; - - // Clean up browser resources. - drop(browser); - handler_handle.abort(); + })??;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tools/screenshot.rs` around lines 353 - 385, The current early-return paths can bypass the interception cleanup (page.execute(FetchDisableParams::default())), browser drop and handler_handle.abort(); factor an async cleanup function (e.g., async fn cleanup_interception(page: Page, browser: Browser, handler_handle: JoinHandle<...>)) that runs page.execute(FetchDisableParams::default()).await, aborts handler_handle, and drops browser, then call that cleanup in every return path: after the capture loop before returning capture_result, and in the error mapping/early-return branches (the map_err closure and any place using ?/??) so cleanup_interception always runs regardless of success or failure. Use the existing identifiers (screenshot_future, paused_events, handle_paused_browser_request, page.execute, drop(browser), handler_handle.abort) to locate where to insert the calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/tools/screenshot.rs`:
- Around line 111-116: The dns_allow_cache lookup around parsed.host_str()
bypasses resolve_and_check_host(&parsed) for subsequent requests to the same
hostname, weakening rebinding protection; remove or bypass the dns_allow_cache
check in the paused-browser request path so resolve_and_check_host(&parsed) is
always awaited for each navigation (i.e., do not short-circuit when
dns_allow_cache.entry(host) is Vacant/Occupied), and ensure the same change is
applied to the other identical block referenced (the block around lines 255-260)
so every request revalidates DNS via resolve_and_check_host rather than relying
on a hostname-only cache.
---
Outside diff comments:
In `@src/tools/screenshot.rs`:
- Around line 353-385: The current early-return paths can bypass the
interception cleanup (page.execute(FetchDisableParams::default())), browser drop
and handler_handle.abort(); factor an async cleanup function (e.g., async fn
cleanup_interception(page: Page, browser: Browser, handler_handle:
JoinHandle<...>)) that runs page.execute(FetchDisableParams::default()).await,
aborts handler_handle, and drops browser, then call that cleanup in every return
path: after the capture loop before returning capture_result, and in the error
mapping/early-return branches (the map_err closure and any place using ?/??) so
cleanup_interception always runs regardless of success or failure. Use the
existing identifiers (screenshot_future, paused_events,
handle_paused_browser_request, page.execute, drop(browser),
handler_handle.abort) to locate where to insert the calls.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/tools/screenshot.rs (1)
345-401: Add direct tests for hop-limit and loop-detection branches.The security-critical branches at Line 375 (
MAX_SCREENSHOT_REDIRECT_HOPS) and Line 389 (duplicate URL loop detection) are not directly exercised by the added tests. Extracting redirect-state updates into a small pure helper would make these regression cases unit-testable.As per coding guidelines, "Prefer small, composable functions over large blocks in Rust code."
Also applies to: 695-727
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tools/screenshot.rs` around lines 345 - 401, The loop in screenshot capture updates redirect state via document_redirect_hops and seen_document_redirect_urls and short-circuits with MAX_SCREENSHOT_REDIRECT_HOPS or duplicate URL checks; extract that logic into a small pure helper function (e.g., fn advance_document_redirect_state(hops: &mut usize, seen: &mut HashSet<String>, new_url: &str, max_hops: usize) -> Result<(), RedirectError>) that performs the hop increment, max-hop check and duplicate detection and returns an error variant instead of directly calling page.execute/FailRequestParams/ZeptoError; call this helper from the capture loop (where document_redirect_hops, seen_document_redirect_urls, event_ref.request.url are used) and create direct unit tests for the helper to cover the MAX_SCREENSHOT_REDIRECT_HOPS and duplicate-URL loop branches; apply the same extraction for the analogous redirect logic in the other block (the second occurrence around the later capture code) so both sites can be unit-tested.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/tools/screenshot.rs`:
- Around line 345-347: The seed and comparisons for redirect loop detection use
raw url_str vs browser-emitted URLs which can differ in representation; update
the logic around document_redirect_hops and seen_document_redirect_urls to
canonicalize/normalize URLs before inserting or comparing (e.g., parse url_str
and browser-emitted URLs with url::Url or a shared normalize_url helper and use
the normalized string) so the initial insert of url_str and subsequent checks
use the same canonical form and reliably detect loops.
---
Nitpick comments:
In `@src/tools/screenshot.rs`:
- Around line 345-401: The loop in screenshot capture updates redirect state via
document_redirect_hops and seen_document_redirect_urls and short-circuits with
MAX_SCREENSHOT_REDIRECT_HOPS or duplicate URL checks; extract that logic into a
small pure helper function (e.g., fn advance_document_redirect_state(hops: &mut
usize, seen: &mut HashSet<String>, new_url: &str, max_hops: usize) -> Result<(),
RedirectError>) that performs the hop increment, max-hop check and duplicate
detection and returns an error variant instead of directly calling
page.execute/FailRequestParams/ZeptoError; call this helper from the capture
loop (where document_redirect_hops, seen_document_redirect_urls,
event_ref.request.url are used) and create direct unit tests for the helper to
cover the MAX_SCREENSHOT_REDIRECT_HOPS and duplicate-URL loop branches; apply
the same extraction for the analogous redirect logic in the other block (the
second occurrence around the later capture code) so both sites can be
unit-tested.
Summary
web_screenshotvia CDP Fetch interception.Fetch.failRequestand keep initial entry-URL SSRF checks as defense in depth.Related Issue
Closes #271
Scope
upstream/maincargo testpassescargo clippy -- -D warningspassescargo fmt --checkpassesTest Plan
cargo fmt -- --checkcargo clippy -- -D warningscargo testcargo clippy --features screenshot -- -D warningscargo test --features screenshot --lib screenshot::tests::(24 passed)Summary by CodeRabbit
Bug Fixes
Tests