feat(tool): add browser automation via Chrome DevTools Protocol (CDP)#2410
feat(tool): add browser automation via Chrome DevTools Protocol (CDP)#2410Yourdaylight wants to merge 1 commit intosipeed:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an optional (build-tagged) CDP-based browser automation tool to PicoClaw, enabling agents to navigate pages, extract structured DOM state, interact with elements, and capture screenshots, while integrating with existing tool registration and config.
Changes:
- Introduces
BrowserTool(CDP build tag) with actions like navigate/state/click/type/screenshot/evaluate plus stealth patches and browsing history. - Implements a lightweight CDP WebSocket client and Chrome executable discovery utilities.
- Wires the tool into configuration (
BrowserToolConfig,IsToolEnabled("browser")) and agent tool registration; adds unit + integration tests.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/tools/browser.go | Core BrowserTool implementation (actions, SSRF guard, history, screenshot/media integration). |
| pkg/tools/browser_cdp.go | CDP WebSocket client for sending commands and handling events. |
| pkg/tools/browser_cdp_util.go | Chrome/Chromium path discovery + launch arg helper. |
| pkg/tools/browser_stealth.go | Stealth JS patches injected on new documents. |
| pkg/tools/browser_stub.go | Non-CDP build stub returning “not compiled” errors. |
| pkg/tools/browser_test.go | Unit tests for URL validation, chrome detection, stealth JS, args helpers. |
| pkg/tools/browser_integration_test.go | CDP integration tests behind integration tag. |
| pkg/config/config.go | Adds BrowserToolConfig and enables tools.browser gating via IsToolEnabled. |
| pkg/agent/instance.go | Registers browser tool when enabled (warns if unavailable). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/tools/browser.go
Outdated
| // Store in media store if available; otherwise clean up temp file | ||
| var mediaRef string | ||
| if t.mediaStore != nil { | ||
| ref, err := t.mediaStore.Store(tmpFile, media.MediaMeta{ | ||
| Filename: "screenshot.png", | ||
| ContentType: "image/png", | ||
| Source: "tool:browser.screenshot", | ||
| }, scope) | ||
| if err != nil { | ||
| logger.WarnCF("tool", "Failed to store screenshot in media store", | ||
| map[string]any{"error": err.Error()}) | ||
| os.Remove(tmpFile) // clean up on store failure | ||
| } else { | ||
| mediaRef = ref | ||
| } | ||
| } else { | ||
| // No media store — remove temp file since nobody will clean it up | ||
| defer os.Remove(tmpFile) | ||
| } | ||
|
|
||
| // Return result with media delivery to user + vision support for LLM | ||
| result := &ToolResult{ | ||
| ForLLM: "Screenshot captured (PNG). LLM can analyze the visual content to understand current page state.", | ||
| ForUser: "Screenshot captured and sent", | ||
| ResponseHandled: true, // Don't wait for LLM response, send immediately to user | ||
| } | ||
|
|
||
| if mediaRef != "" { | ||
| result.Media = []string{mediaRef} | ||
| } |
There was a problem hiding this comment.
screenshot returns ResponseHandled=true and tells the user the screenshot was sent even when mediaStore is nil or when mediaStore.Store fails (in both cases mediaRef stays empty). This makes the tool report success without delivering any media, and the deferred temp-file cleanup also removes the only copy. Consider returning an error when no MediaStore is configured (similar to send_file), or returning an artifact path/ref that can actually be delivered; also treat Store failures as tool errors rather than silent warnings.
| func (t *BrowserTool) executeScroll(ctx context.Context, args map[string]any) *ToolResult { | ||
| direction, _ := args["direction"].(string) | ||
| if direction == "" { | ||
| // Also accept from "text" parameter for convenience | ||
| direction, _ = args["text"].(string) | ||
| } | ||
|
|
||
| amount := 500 // pixels | ||
| if direction == "up" { | ||
| amount = -500 | ||
| } | ||
|
|
||
| js := fmt.Sprintf(`window.scrollBy(0, %d); JSON.stringify({scrollY: window.scrollY});`, amount) | ||
| raw, err := t.cdp.Evaluate(ctx, js) | ||
| if err != nil { | ||
| return ErrorResult(fmt.Sprintf("scroll failed: %v", err)) | ||
| } | ||
|
|
||
| var resultStr string | ||
| json.Unmarshal(raw, &resultStr) | ||
|
|
||
| return SilentResult(fmt.Sprintf("Scrolled %s. Run 'state' to see updated elements.", direction)) |
There was a problem hiding this comment.
scroll accepts any direction value (and even an empty one) because argument validation currently doesn't enforce the schema enum. For invalid/empty directions the code scrolls down by default but returns a misleading message (e.g. "Scrolled "). Add explicit validation (only "up"/"down") and return an error for anything else.
| var resultStr string | ||
| json.Unmarshal(raw, &resultStr) | ||
| var result struct { | ||
| OK bool `json:"ok"` | ||
| Error string `json:"error"` | ||
| Selected string `json:"selected"` | ||
| } | ||
| json.Unmarshal([]byte(resultStr), &result) | ||
|
|
||
| if result.Error != "" { | ||
| return ErrorResult(result.Error) | ||
| } | ||
| return SilentResult(fmt.Sprintf("Selected %q in [%d].", result.Selected, index)) | ||
| } |
There was a problem hiding this comment.
Several actions ignore JSON unmarshal failures from CDP Evaluate results (e.g. select ignores both unmarshals). If parsing fails, the code can incorrectly report success with empty fields. Please handle json.Unmarshal errors and surface a tool error when the CDP/JS result isn't in the expected shape.
| var resultStr string | ||
| json.Unmarshal(raw, &resultStr) | ||
| var result struct { | ||
| Text string `json:"text"` | ||
| Error string `json:"error"` | ||
| } | ||
| json.Unmarshal([]byte(resultStr), &result) | ||
|
|
||
| if result.Error != "" { | ||
| return ErrorResult(result.Error) | ||
| } | ||
| return SilentResult(fmt.Sprintf("[%d] text: %q", index, result.Text)) | ||
| } |
There was a problem hiding this comment.
get_text ignores errors from both json.Unmarshal calls. If CDP returns a non-string/non-JSON value, this can silently produce an empty resultStr/Text and mislead the agent. Handle unmarshal errors and return an ErrorResult when parsing fails.
| // validateBrowserURL checks that a URL is safe to navigate to. | ||
| // It blocks private networks, loopback, metadata endpoints, and non-HTTP schemes. | ||
| func validateBrowserURL(urlStr string) error { | ||
| if urlStr == "" { | ||
| return fmt.Errorf("empty URL") | ||
| } | ||
|
|
||
| parsed, err := url.Parse(urlStr) | ||
| if err != nil { | ||
| return fmt.Errorf("invalid URL: %w", err) | ||
| } | ||
|
|
||
| if parsed.Scheme != "http" && parsed.Scheme != "https" { | ||
| return fmt.Errorf("only http/https URLs are allowed (got %s://)", parsed.Scheme) | ||
| } | ||
|
|
||
| if parsed.Host == "" { | ||
| return fmt.Errorf("missing host in URL") | ||
| } | ||
|
|
||
| hostname := parsed.Hostname() | ||
|
|
||
| // Block obvious private/local hosts (string check) | ||
| if hostname == "localhost" || hostname == "0.0.0.0" || | ||
| hostname == "metadata.google.internal" { | ||
| return fmt.Errorf("navigation to %s is not allowed", hostname) | ||
| } | ||
|
|
||
| // Parse as IP to catch all representations (hex, octal, IPv4-mapped IPv6, etc.) | ||
| ip := net.ParseIP(hostname) | ||
| if ip != nil { | ||
| if ip.IsLoopback() || ip.IsPrivate() || ip.IsLinkLocalUnicast() || | ||
| ip.IsLinkLocalMulticast() || ip.IsUnspecified() { | ||
| return fmt.Errorf("navigation to private/local IP %s is not allowed", hostname) | ||
| } | ||
| // Block cloud metadata IPs | ||
| if ip.Equal(net.ParseIP("169.254.169.254")) { | ||
| return fmt.Errorf("navigation to cloud metadata endpoint is not allowed") | ||
| } | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
The SSRF check only blocks literal IP hosts and a few hard-coded hostnames; it does not resolve DNS for normal hostnames. An attacker can supply a hostname that resolves to a private/loopback/metadata IP and bypass this check. To make the claimed SSRF protection real, resolve A/AAAA records (and ideally re-check redirects) and block if any resolved IP is private/loopback/link-local/unspecified/metadata.
| // Close closes the WebSocket connection. | ||
| func (c *CDPClient) Close() error { | ||
| if c.closed.Swap(true) { | ||
| return nil // already closed | ||
| } | ||
| close(c.done) | ||
|
|
||
| // Cancel all pending requests | ||
| c.mu.Lock() | ||
| for id, p := range c.pending { | ||
| p.ch <- cdpResponse{Err: fmt.Errorf("CDP connection closed")} | ||
| delete(c.pending, id) | ||
| } | ||
| c.mu.Unlock() | ||
|
|
||
| return c.conn.Close() | ||
| } |
There was a problem hiding this comment.
Close() iterates pending requests and does a blocking send into each p.ch. If a request already has a response buffered (channel size 1) and Close() races with readLoop, this send can block indefinitely. Use a non-blocking send (select { case ch<-...: default: }), close the channel, or store cancellation state in the pending struct to avoid deadlock during shutdown.
pkg/tools/browser_cdp.go
Outdated
| // removeHandler removes a specific handler from the event listeners. | ||
| // It uses a linear scan which is fine for the small number of handlers we use. | ||
| func (c *CDPClient) removeHandler(event string, target func(json.RawMessage)) { | ||
| c.eventMu.Lock() | ||
| defer c.eventMu.Unlock() | ||
| handlers := c.events[event] | ||
| for i := range handlers { | ||
| // Direct pointer comparison for function values allocated in the same scope | ||
| if &handlers[i] == nil { | ||
| continue | ||
| } | ||
| // Use fmt.Sprintf as a last resort for identity check. | ||
| // This is acceptable since we register very few handlers. | ||
| if fmt.Sprintf("%p", handlers[i]) == fmt.Sprintf("%p", target) { | ||
| c.events[event] = append(handlers[:i], handlers[i+1:]...) | ||
| return | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
removeHandler's identity check is brittle: &handlers[i] == nil is unreachable, and comparing fmt.Sprintf("%p", fn) for func values relies on formatting behavior rather than a stable identifier. If removal fails, handlers will leak and keep firing. Consider changing On to return a handler ID/token (or store uintptr from reflect.ValueOf(fn).Pointer() alongside the handler) and remove by that, or maintain per-event slices of structs {id, fn}.
353b9f7 to
e66eccb
Compare
e66eccb to
ac969ec
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // NewBrowserTool creates a new BrowserTool. It verifies that Chrome is available | ||
| // and attempts to connect to the CDP endpoint. Returns an error if Chrome is | ||
| // not found or the connection fails. | ||
| func NewBrowserTool(cfg config.BrowserToolConfig) (*BrowserTool, error) { | ||
| chromePath, err := FindChromePath() | ||
| if err != nil { | ||
| return nil, fmt.Errorf( | ||
| "Chrome/Chromium required for browser tool but not found. "+ | ||
| "Install Chrome and restart, or set CHROME_PATH env var. "+ | ||
| "Error: %w", err) | ||
| } | ||
|
|
||
| logger.InfoCF("tool", "Chrome found for browser tool", | ||
| map[string]any{"path": chromePath}) | ||
|
|
||
| var stealthJS string | ||
| if cfg.Stealth { | ||
| stealthJS = generateStealthJS() | ||
| } | ||
|
|
||
| return &BrowserTool{ | ||
| cfg: cfg, | ||
| chromePath: chromePath, | ||
| stealthJS: stealthJS, | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
NewBrowserTool's doc comment says it "attempts to connect to the CDP endpoint" and will error if the connection fails, but the implementation only discovers Chrome and defers the CDP connection to connectIfNeeded() on first Execute(). Update the comment (or eagerly connect) so callers have correct expectations about when failures surface.
| js := fmt.Sprintf(`(function() { | ||
| var el = document.querySelector('[data-pcw-idx="%d"]'); | ||
| if (!el) return JSON.stringify({error: 'Element [%d] not found. Run state to refresh indices.'}); | ||
| el.scrollIntoView({block: 'center', behavior: 'instant'}); |
There was a problem hiding this comment.
scrollIntoView is called with behavior: 'instant', but ScrollBehavior only supports 'auto' and 'smooth'. In Chrome this can throw a TypeError, breaking click/type/fill actions. Use a valid behavior value (e.g. 'auto') or omit the option.
| el.scrollIntoView({block: 'center', behavior: 'instant'}); | |
| el.scrollIntoView({block: 'center', behavior: 'auto'}); |
| if t.mediaStore != nil { | ||
| ref, err := t.mediaStore.Store(tmpFile, media.MediaMeta{ | ||
| Filename: "screenshot.png", | ||
| ContentType: "image/png", | ||
| Source: "tool:browser.screenshot", | ||
| }, scope) | ||
| if err != nil { | ||
| logger.WarnCF("tool", "Failed to store screenshot", | ||
| map[string]any{"error": err.Error()}) | ||
| // Fall through to inline path | ||
| } else { | ||
| os.Remove(tmpFile) | ||
| return &ToolResult{ | ||
| ForLLM: "Screenshot captured and sent to user (PNG).", | ||
| ForUser: "Screenshot captured", | ||
| Media: []string{ref}, | ||
| ResponseHandled: true, | ||
| } |
There was a problem hiding this comment.
This screenshot path deletes tmpFile after storing it in MediaStore, but MediaStore.Store explicitly does not copy/move the file (it just records a mapping). Removing the file here will make the returned media ref unresolvable/broken later. Keep the file on disk and rely on MediaStore cleanup/release policies instead.
| cfg config.BrowserToolConfig | ||
| cdp *CDPClient | ||
| cdpMu sync.Mutex // guards lazy cdp connection | ||
| chromePath string | ||
| stealthJS string | ||
| mediaStore media.MediaStore | ||
| history []pageVisit // browsing history, most recent last |
There was a problem hiding this comment.
BrowserTool stores mutable per-session state (history slice, cdp pointer) on the tool instance without synchronizing access. Tool execution can be concurrent, and the codebase explicitly relies on immutable request context to avoid shared mutable state on singleton tools. Add a mutex (or another concurrency-safe state holder) around history updates and close/reset operations (and any other mutable fields).
| cfg config.BrowserToolConfig | |
| cdp *CDPClient | |
| cdpMu sync.Mutex // guards lazy cdp connection | |
| chromePath string | |
| stealthJS string | |
| mediaStore media.MediaStore | |
| history []pageVisit // browsing history, most recent last | |
| cfg config.BrowserToolConfig | |
| stateMu sync.Mutex // guards mutable browser session state below, including cdp/history close/reset | |
| cdp *CDPClient // guarded by stateMu | |
| history []pageVisit // guarded by stateMu; browsing history, most recent last | |
| cdpMu sync.Mutex // guards lazy cdp connection orchestration | |
| chromePath string | |
| stealthJS string | |
| mediaStore media.MediaStore |
| js := `(function() { | ||
| var selectors = 'a, button, input, select, textarea, [role="button"], [role="link"], [role="tab"], [onclick], [tabindex]:not([tabindex="-1"])'; | ||
| var elements = document.querySelectorAll(selectors); | ||
| var result = []; | ||
| var idx = 0; | ||
| for (var i = 0; i < elements.length; i++) { | ||
| var el = elements[i]; | ||
| var rect = el.getBoundingClientRect(); | ||
| if (rect.width === 0 && rect.height === 0) continue; | ||
| if (getComputedStyle(el).visibility === 'hidden') continue; | ||
| if (getComputedStyle(el).display === 'none') continue; | ||
| var tag = el.tagName.toLowerCase(); | ||
| var info = { | ||
| i: idx, | ||
| tag: tag, | ||
| text: (el.textContent || '').trim().slice(0, 80).replace(/\s+/g, ' ') | ||
| }; | ||
| if (el.getAttribute('role')) info.role = el.getAttribute('role'); | ||
| if (el.getAttribute('type')) info.type = el.getAttribute('type'); | ||
| if (el.getAttribute('name')) info.name = el.getAttribute('name'); | ||
| if (el.getAttribute('placeholder')) info.placeholder = el.getAttribute('placeholder'); | ||
| if (el.value !== undefined && el.value !== '') info.value = String(el.value).slice(0, 80); | ||
| if (tag === 'a' && el.getAttribute('href')) info.href = el.getAttribute('href').slice(0, 120); | ||
| if (el.getAttribute('aria-label')) info.label = el.getAttribute('aria-label').slice(0, 80); | ||
| if (el.disabled) info.disabled = true; | ||
| // Store selector path for interaction | ||
| el.setAttribute('data-pcw-idx', String(idx)); | ||
| result.push(info); | ||
| idx++; | ||
| } |
There was a problem hiding this comment.
state() writes data-pcw-idx attributes but never clears previous indices from elements that are no longer considered interactive/visible. A later click/type can still find and act on a stale element by old index, even after telling the user to re-run state. Consider clearing existing [data-pcw-idx] attributes at the start of state(), and/or re-checking visibility in click/type/fill before interacting.
| // Send sends a CDP command and waits for the response. | ||
| func (c *CDPClient) Send(method string, params map[string]any) (json.RawMessage, error) { | ||
| return c.SendWithTimeout(method, params, cdpSendTimeout) | ||
| } | ||
|
|
||
| // SendWithTimeout sends a CDP command with a custom timeout. | ||
| func (c *CDPClient) SendWithTimeout(method string, params map[string]any, timeout time.Duration) (json.RawMessage, error) { | ||
| if c.closed.Load() { | ||
| return nil, fmt.Errorf("CDP connection closed") | ||
| } | ||
|
|
||
| id := atomic.AddInt64(&c.idCount, 1) | ||
|
|
||
| // Build request message | ||
| msg := map[string]any{ | ||
| "id": id, | ||
| "method": method, | ||
| } | ||
| if params != nil { | ||
| msg["params"] = params | ||
| } | ||
|
|
||
| data, err := json.Marshal(msg) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to marshal CDP message: %w", err) | ||
| } | ||
|
|
||
| // Register pending request | ||
| p := &cdpPending{ | ||
| ch: make(chan cdpResponse, 1), | ||
| timer: time.NewTimer(timeout), | ||
| } | ||
|
|
||
| c.mu.Lock() | ||
| c.pending[id] = p | ||
| c.mu.Unlock() | ||
|
|
||
| // Clean up on return | ||
| defer func() { | ||
| p.timer.Stop() | ||
| c.mu.Lock() | ||
| delete(c.pending, id) | ||
| c.mu.Unlock() | ||
| }() | ||
|
|
||
| // Send | ||
| c.mu.Lock() | ||
| err = c.conn.WriteMessage(websocket.TextMessage, data) | ||
| c.mu.Unlock() | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to send CDP message: %w", err) | ||
| } | ||
|
|
||
| // Wait for response or timeout | ||
| select { | ||
| case resp := <-p.ch: | ||
| return resp.Result, resp.Err | ||
| case <-p.timer.C: | ||
| return nil, fmt.Errorf("CDP command %q timed out after %v", method, timeout) | ||
| case <-c.done: | ||
| return nil, fmt.Errorf("CDP connection closed while waiting for %q", method) | ||
| } | ||
| } |
There was a problem hiding this comment.
CDPClient.Send/SendWithTimeout ignores the passed-in context from tool actions; it uses a fixed timer and doesn't set per-call read/write deadlines. As a result, BrowserTool's ctx timeout/cancellation may not be respected while waiting on CDP commands. Consider threading ctx through Send (or adding SendCtx) and using ctx.Deadline/Done to bound both the WebSocket write and the response wait.
| type BrowserToolConfig struct { | ||
| ToolConfig `envPrefix:"PICOCLAW_TOOLS_BROWSER_"` | ||
| CDPEndpoint string `json:"cdp_endpoint" yaml:"cdp_endpoint,omitempty" env:"PICOCLAW_TOOLS_BROWSER_CDP_ENDPOINT"` | ||
| Timeout int `json:"timeout_seconds" yaml:"timeout_seconds,omitempty" env:"PICOCLAW_TOOLS_BROWSER_TIMEOUT"` |
There was a problem hiding this comment.
BrowserToolConfig.Timeout is tagged as json/yaml "timeout_seconds" but its env var is PICOCLAW_TOOLS_BROWSER_TIMEOUT. This is inconsistent with other *_TIMEOUT_SECONDS env vars in this config and makes configuration harder to discover. Consider renaming the env tag to PICOCLAW_TOOLS_BROWSER_TIMEOUT_SECONDS (or aligning the field name/tag).
| Timeout int `json:"timeout_seconds" yaml:"timeout_seconds,omitempty" env:"PICOCLAW_TOOLS_BROWSER_TIMEOUT"` | |
| Timeout int `json:"timeout_seconds" yaml:"timeout_seconds,omitempty" env:"PICOCLAW_TOOLS_BROWSER_TIMEOUT_SECONDS"` |
| func TestIntegration_Screenshot(t *testing.T) { | ||
| tool := setupBrowserTool(t) | ||
| ctx := context.Background() | ||
|
|
||
| tool.Execute(ctx, map[string]any{"action": "navigate", "url": "https://example.com"}) | ||
| result := tool.Execute(ctx, map[string]any{"action": "screenshot"}) | ||
| if result.IsError { | ||
| t.Fatalf("screenshot failed: %s", result.ForLLM) | ||
| } | ||
| t.Logf("Screenshot result: ForLLM=%s, ForUser=%s, Media=%v", | ||
| result.ForLLM, result.ForUser, result.Media) | ||
|
|
||
| // Verify temp file was created (even without MediaStore it should have existed briefly) | ||
| if !strings.Contains(result.ForLLM, "Screenshot captured") { | ||
| t.Error("unexpected screenshot result") | ||
| } | ||
|
|
||
| tool.Execute(ctx, map[string]any{"action": "close"}) | ||
| } |
There was a problem hiding this comment.
This integration test asserts result.ForLLM contains "Screenshot captured", but the screenshot implementation returns "Screenshot saved to ..." when no MediaStore is configured (which is the case in setupBrowserTool). This will fail under -tags integration unless a MediaStore is injected or the assertion is updated to match the actual behavior.
| // Focus the element | ||
| focusJS := fmt.Sprintf(`(function() { | ||
| var el = document.querySelector('[data-pcw-idx="%d"]'); | ||
| if (!el) return JSON.stringify({error: 'Element [%d] not found. Run state to refresh indices.'}); | ||
| el.scrollIntoView({block: 'center', behavior: 'instant'}); | ||
| el.focus(); | ||
| return JSON.stringify({ok: true}); | ||
| })();`, index, index) | ||
|
|
There was a problem hiding this comment.
scrollIntoView is called with behavior: 'instant', but ScrollBehavior only supports 'auto' and 'smooth'. This can throw a TypeError and break typing/focus. Use a valid behavior value (e.g. 'auto') or omit the option.
| // Focus, select all existing text, then type new text | ||
| clearJS := fmt.Sprintf(`(function() { | ||
| var el = document.querySelector('[data-pcw-idx="%d"]'); | ||
| if (!el) return JSON.stringify({error: 'Element [%d] not found.'}); | ||
| el.scrollIntoView({block: 'center', behavior: 'instant'}); | ||
| el.focus(); | ||
| el.value = ''; | ||
| el.dispatchEvent(new Event('input', {bubbles: true})); | ||
| return JSON.stringify({ok: true}); | ||
| })();`, index, index) |
There was a problem hiding this comment.
scrollIntoView is called with behavior: 'instant', but ScrollBehavior only supports 'auto' and 'smooth'. This can throw a TypeError and break fill/clear. Use a valid behavior value (e.g. 'auto') or omit the option.
Implement a BrowserTool that provides browser automation capabilities through direct CDP WebSocket connection to Chrome/Chromium. This addresses Issue sipeed#293 (Browser Automation) on the project roadmap. Key features: - Go-native CDP client using gorilla/websocket (zero new dependencies) - 12 actions: navigate, state, click, type, fill, select, screenshot, get_text, scroll, keys, evaluate, close - [N]-indexed DOM state extraction (borrowed from opencli) for LLM-friendly structured output at zero vision token cost - 13 stealth anti-detection patches (navigator.webdriver, chrome stub, plugin spoofing, CDP stack trace cleanup, etc.) - Cross-platform Chrome auto-detection (macOS/Linux/Windows + CHROME_PATH) - Screenshot delivery via MediaStore + channel system to users - Browsing history tracking with compact summary in state output - SSRF protection using net.ParseIP for full CIDR coverage (private, loopback, link-local, metadata endpoints) - Concurrent-safe lazy CDP connection with sync.Mutex - Optional via //go:build cdp tag — zero impact on default builds Build: - Default: go build — no browser code included (stub returns error) - Full: go build -tags cdp — complete browser support enabled New files: - pkg/tools/browser.go — BrowserTool core (Tool interface) - pkg/tools/browser_cdp.go — CDP WebSocket client - pkg/tools/browser_cdp_util.go — Chrome path detection - pkg/tools/browser_stealth.go — Anti-detection JS generator - pkg/tools/browser_stub.go — Stub for non-CDP builds - pkg/tools/browser_test.go — Unit tests (8 tests) - pkg/tools/browser_integration_test.go — Integration tests (9 tests) Modified files: - pkg/config/config.go — BrowserToolConfig + IsToolEnabled("browser") - pkg/agent/instance.go — Browser tool registration Ref: sipeed#293
ac969ec to
9d2a5bc
Compare
Summary
Implement a BrowserTool that provides browser automation capabilities through direct CDP (Chrome DevTools Protocol) WebSocket connection. This addresses Issue #293 (Browser Automation) from the project roadmap.
Key design choices:
gorilla/websocket(already in go.mod — zero new dependencies)//go:build cdp— default builds have zero browser code;go build -tags cdpenables full supportFeatures
navigatestateclick [N]type [N] "text"fill [N] "text"select [N] "option"screenshotget_text [N]scroll up/downkeys <key>evaluate <js>closeSecurity
net.ParseIP— blocks private/loopback/link-local/metadata IPsevaluatedisabled by default (allow_evaluate: false)sync.Mutex)json.Marshal(notfmt.Sprintf %q)Browsing History
Tracks visited pages and includes compact history summary in
stateoutput:Files
New (7 files, +2230 lines):
pkg/tools/browser.go— BrowserTool core (Tool interface, 12 actions)pkg/tools/browser_cdp.go— CDP WebSocket clientpkg/tools/browser_cdp_util.go— Chrome path auto-detection (macOS/Linux/Windows)pkg/tools/browser_stealth.go— 13 anti-detection JS patchespkg/tools/browser_stub.go— Stub for non-CDP buildspkg/tools/browser_test.go— Unit tests (8 tests)pkg/tools/browser_integration_test.go— Integration tests (9 tests)Modified (2 files):
pkg/config/config.go—BrowserToolConfigstruct +IsToolEnabled("browser")pkg/agent/instance.go— Browser tool registrationTest plan
go build(default) compiles without browser codego build -tags cdpcompiles with full browser supportgo vetpasses for both build tag variantsnavigator.webdriver === falseconfirmedRef: #293