Skip to content

feat(tool): add browser automation via Chrome DevTools Protocol (CDP)#2410

Open
Yourdaylight wants to merge 1 commit intosipeed:mainfrom
Yourdaylight:feat/browser-cdp-tool
Open

feat(tool): add browser automation via Chrome DevTools Protocol (CDP)#2410
Yourdaylight wants to merge 1 commit intosipeed:mainfrom
Yourdaylight:feat/browser-cdp-tool

Conversation

@Yourdaylight
Copy link
Copy Markdown

Summary

Implement a BrowserTool that provides browser automation capabilities through direct CDP (Chrome DevTools Protocol) WebSocket connection. This addresses Issue #293 (Browser Automation) from the project roadmap.

Key design choices:

  • Go-native CDP client using gorilla/websocket (already in go.mod — zero new dependencies)
  • Optional via //go:build cdp — default builds have zero browser code; go build -tags cdp enables full support
  • Borrowed [N]-indexed DOM state from opencli for LLM-friendly structured output at zero vision token cost
  • 13 stealth anti-detection patches to avoid website bot detection

Features

Action Description
navigate Open URL, wait for load
state Extract [N]-indexed interactive DOM elements (core action)
click [N] Click element by index
type [N] "text" Type into element
fill [N] "text" Clear + type into element
select [N] "option" Select dropdown option
screenshot Capture PNG, deliver via MediaStore to user
get_text [N] Extract element text/value
scroll up/down Scroll page
keys <key> Keyboard input (Enter, Tab, etc.)
evaluate <js> Run JavaScript (opt-in via config)
close Close browser session

Security

  • SSRF protection via net.ParseIP — blocks private/loopback/link-local/metadata IPs
  • JS evaluate disabled by default (allow_evaluate: false)
  • Concurrent-safe lazy CDP connection (sync.Mutex)
  • Screenshot temp files cleaned up properly
  • Safe JS string embedding via json.Marshal (not fmt.Sprintf %q)

Browsing History

Tracks visited pages and includes compact history summary in state output:

Browsing history (3 pages):
  1. Example Domain — https://example.com/
  2. Sign in to GitHub — https://github.com/login
> 3. Hacker News — https://news.ycombinator.com/

Current page: Hacker News
URL: https://news.ycombinator.com/

Interactive elements (25):
[0] a "Hacker News"
[1] a "new"
...

Files

New (7 files, +2230 lines):

  • pkg/tools/browser.go — BrowserTool core (Tool interface, 12 actions)
  • pkg/tools/browser_cdp.go — CDP WebSocket client
  • pkg/tools/browser_cdp_util.go — Chrome path auto-detection (macOS/Linux/Windows)
  • pkg/tools/browser_stealth.go — 13 anti-detection JS patches
  • pkg/tools/browser_stub.go — Stub for non-CDP builds
  • pkg/tools/browser_test.go — Unit tests (8 tests)
  • pkg/tools/browser_integration_test.go — Integration tests (9 tests)

Modified (2 files):

  • pkg/config/config.goBrowserToolConfig struct + IsToolEnabled("browser")
  • pkg/agent/instance.go — Browser tool registration

Test plan

  • go build (default) compiles without browser code
  • go build -tags cdp compiles with full browser support
  • go vet passes for both build tag variants
  • Unit tests pass (8/8): Chrome detection, URL validation, stealth JS, params
  • Integration tests pass (9/9): navigate, state, screenshot, click, type, evaluate, stealth, scroll, SSRF blocking, temp file cleanup
  • E2E verified against Chrome 146 headless: example.com, GitHub login page, Hacker News
  • GitHub login form correctly detected: username input, password input, submit button
  • Stealth: navigator.webdriver === false confirmed
  • SSRF: localhost, 127.0.0.1, 10.x, 192.168.x, 172.16.x, [::1], 169.254.169.254 all blocked

Ref: #293

Copilot AI review requested due to automatic review settings April 7, 2026 12:51
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 7, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional (build-tagged) CDP-based browser automation tool to PicoClaw, enabling agents to navigate pages, extract structured DOM state, interact with elements, and capture screenshots, while integrating with existing tool registration and config.

Changes:

  • Introduces BrowserTool (CDP build tag) with actions like navigate/state/click/type/screenshot/evaluate plus stealth patches and browsing history.
  • Implements a lightweight CDP WebSocket client and Chrome executable discovery utilities.
  • Wires the tool into configuration (BrowserToolConfig, IsToolEnabled("browser")) and agent tool registration; adds unit + integration tests.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
pkg/tools/browser.go Core BrowserTool implementation (actions, SSRF guard, history, screenshot/media integration).
pkg/tools/browser_cdp.go CDP WebSocket client for sending commands and handling events.
pkg/tools/browser_cdp_util.go Chrome/Chromium path discovery + launch arg helper.
pkg/tools/browser_stealth.go Stealth JS patches injected on new documents.
pkg/tools/browser_stub.go Non-CDP build stub returning “not compiled” errors.
pkg/tools/browser_test.go Unit tests for URL validation, chrome detection, stealth JS, args helpers.
pkg/tools/browser_integration_test.go CDP integration tests behind integration tag.
pkg/config/config.go Adds BrowserToolConfig and enables tools.browser gating via IsToolEnabled.
pkg/agent/instance.go Registers browser tool when enabled (warns if unavailable).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +577 to +606
// Store in media store if available; otherwise clean up temp file
var mediaRef string
if t.mediaStore != nil {
ref, err := t.mediaStore.Store(tmpFile, media.MediaMeta{
Filename: "screenshot.png",
ContentType: "image/png",
Source: "tool:browser.screenshot",
}, scope)
if err != nil {
logger.WarnCF("tool", "Failed to store screenshot in media store",
map[string]any{"error": err.Error()})
os.Remove(tmpFile) // clean up on store failure
} else {
mediaRef = ref
}
} else {
// No media store — remove temp file since nobody will clean it up
defer os.Remove(tmpFile)
}

// Return result with media delivery to user + vision support for LLM
result := &ToolResult{
ForLLM: "Screenshot captured (PNG). LLM can analyze the visual content to understand current page state.",
ForUser: "Screenshot captured and sent",
ResponseHandled: true, // Don't wait for LLM response, send immediately to user
}

if mediaRef != "" {
result.Media = []string{mediaRef}
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

screenshot returns ResponseHandled=true and tells the user the screenshot was sent even when mediaStore is nil or when mediaStore.Store fails (in both cases mediaRef stays empty). This makes the tool report success without delivering any media, and the deferred temp-file cleanup also removes the only copy. Consider returning an error when no MediaStore is configured (similar to send_file), or returning an artifact path/ref that can actually be delivered; also treat Store failures as tool errors rather than silent warnings.

Copilot uses AI. Check for mistakes.
Comment on lines +648 to +669
func (t *BrowserTool) executeScroll(ctx context.Context, args map[string]any) *ToolResult {
direction, _ := args["direction"].(string)
if direction == "" {
// Also accept from "text" parameter for convenience
direction, _ = args["text"].(string)
}

amount := 500 // pixels
if direction == "up" {
amount = -500
}

js := fmt.Sprintf(`window.scrollBy(0, %d); JSON.stringify({scrollY: window.scrollY});`, amount)
raw, err := t.cdp.Evaluate(ctx, js)
if err != nil {
return ErrorResult(fmt.Sprintf("scroll failed: %v", err))
}

var resultStr string
json.Unmarshal(raw, &resultStr)

return SilentResult(fmt.Sprintf("Scrolled %s. Run 'state' to see updated elements.", direction))
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scroll accepts any direction value (and even an empty one) because argument validation currently doesn't enforce the schema enum. For invalid/empty directions the code scrolls down by default but returns a misleading message (e.g. "Scrolled "). Add explicit validation (only "up"/"down") and return an error for anything else.

Copilot uses AI. Check for mistakes.
Comment on lines +538 to +551
var resultStr string
json.Unmarshal(raw, &resultStr)
var result struct {
OK bool `json:"ok"`
Error string `json:"error"`
Selected string `json:"selected"`
}
json.Unmarshal([]byte(resultStr), &result)

if result.Error != "" {
return ErrorResult(result.Error)
}
return SilentResult(fmt.Sprintf("Selected %q in [%d].", result.Selected, index))
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several actions ignore JSON unmarshal failures from CDP Evaluate results (e.g. select ignores both unmarshals). If parsing fails, the code can incorrectly report success with empty fields. Please handle json.Unmarshal errors and surface a tool error when the CDP/JS result isn't in the expected shape.

Copilot uses AI. Check for mistakes.
Comment on lines +634 to +646
var resultStr string
json.Unmarshal(raw, &resultStr)
var result struct {
Text string `json:"text"`
Error string `json:"error"`
}
json.Unmarshal([]byte(resultStr), &result)

if result.Error != "" {
return ErrorResult(result.Error)
}
return SilentResult(fmt.Sprintf("[%d] text: %q", index, result.Text))
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_text ignores errors from both json.Unmarshal calls. If CDP returns a non-string/non-JSON value, this can silently produce an empty resultStr/Text and mislead the agent. Handle unmarshal errors and return an ErrorResult when parsing fails.

Copilot uses AI. Check for mistakes.
Comment on lines +802 to +843
// validateBrowserURL checks that a URL is safe to navigate to.
// It blocks private networks, loopback, metadata endpoints, and non-HTTP schemes.
func validateBrowserURL(urlStr string) error {
if urlStr == "" {
return fmt.Errorf("empty URL")
}

parsed, err := url.Parse(urlStr)
if err != nil {
return fmt.Errorf("invalid URL: %w", err)
}

if parsed.Scheme != "http" && parsed.Scheme != "https" {
return fmt.Errorf("only http/https URLs are allowed (got %s://)", parsed.Scheme)
}

if parsed.Host == "" {
return fmt.Errorf("missing host in URL")
}

hostname := parsed.Hostname()

// Block obvious private/local hosts (string check)
if hostname == "localhost" || hostname == "0.0.0.0" ||
hostname == "metadata.google.internal" {
return fmt.Errorf("navigation to %s is not allowed", hostname)
}

// Parse as IP to catch all representations (hex, octal, IPv4-mapped IPv6, etc.)
ip := net.ParseIP(hostname)
if ip != nil {
if ip.IsLoopback() || ip.IsPrivate() || ip.IsLinkLocalUnicast() ||
ip.IsLinkLocalMulticast() || ip.IsUnspecified() {
return fmt.Errorf("navigation to private/local IP %s is not allowed", hostname)
}
// Block cloud metadata IPs
if ip.Equal(net.ParseIP("169.254.169.254")) {
return fmt.Errorf("navigation to cloud metadata endpoint is not allowed")
}
}

return nil
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SSRF check only blocks literal IP hosts and a few hard-coded hostnames; it does not resolve DNS for normal hostnames. An attacker can supply a hostname that resolves to a private/loopback/metadata IP and bypass this check. To make the claimed SSRF protection real, resolve A/AAAA records (and ideally re-check redirects) and block if any resolved IP is private/loopback/link-local/unspecified/metadata.

Copilot uses AI. Check for mistakes.
Comment on lines +198 to +214
// Close closes the WebSocket connection.
func (c *CDPClient) Close() error {
if c.closed.Swap(true) {
return nil // already closed
}
close(c.done)

// Cancel all pending requests
c.mu.Lock()
for id, p := range c.pending {
p.ch <- cdpResponse{Err: fmt.Errorf("CDP connection closed")}
delete(c.pending, id)
}
c.mu.Unlock()

return c.conn.Close()
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close() iterates pending requests and does a blocking send into each p.ch. If a request already has a response buffered (channel size 1) and Close() races with readLoop, this send can block indefinitely. Use a non-blocking send (select { case ch<-...: default: }), close the channel, or store cancellation state in the pending struct to avoid deadlock during shutdown.

Copilot uses AI. Check for mistakes.
Comment on lines +442 to +460
// removeHandler removes a specific handler from the event listeners.
// It uses a linear scan which is fine for the small number of handlers we use.
func (c *CDPClient) removeHandler(event string, target func(json.RawMessage)) {
c.eventMu.Lock()
defer c.eventMu.Unlock()
handlers := c.events[event]
for i := range handlers {
// Direct pointer comparison for function values allocated in the same scope
if &handlers[i] == nil {
continue
}
// Use fmt.Sprintf as a last resort for identity check.
// This is acceptable since we register very few handlers.
if fmt.Sprintf("%p", handlers[i]) == fmt.Sprintf("%p", target) {
c.events[event] = append(handlers[:i], handlers[i+1:]...)
return
}
}
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeHandler's identity check is brittle: &handlers[i] == nil is unreachable, and comparing fmt.Sprintf("%p", fn) for func values relies on formatting behavior rather than a stable identifier. If removal fails, handlers will leak and keep firing. Consider changing On to return a handler ID/token (or store uintptr from reflect.ValueOf(fn).Pointer() alongside the handler) and remove by that, or maintain per-event slices of structs {id, fn}.

Copilot uses AI. Check for mistakes.
@Yourdaylight Yourdaylight force-pushed the feat/browser-cdp-tool branch from 353b9f7 to e66eccb Compare April 7, 2026 13:12
@sipeed-bot sipeed-bot bot added type: enhancement New feature or request domain: tool domain: agent go Pull requests that update go code labels Apr 7, 2026
Copilot AI review requested due to automatic review settings April 8, 2026 07:39
@Yourdaylight Yourdaylight force-pushed the feat/browser-cdp-tool branch from e66eccb to ac969ec Compare April 8, 2026 07:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 12 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +44 to +69
// NewBrowserTool creates a new BrowserTool. It verifies that Chrome is available
// and attempts to connect to the CDP endpoint. Returns an error if Chrome is
// not found or the connection fails.
func NewBrowserTool(cfg config.BrowserToolConfig) (*BrowserTool, error) {
chromePath, err := FindChromePath()
if err != nil {
return nil, fmt.Errorf(
"Chrome/Chromium required for browser tool but not found. "+
"Install Chrome and restart, or set CHROME_PATH env var. "+
"Error: %w", err)
}

logger.InfoCF("tool", "Chrome found for browser tool",
map[string]any{"path": chromePath})

var stealthJS string
if cfg.Stealth {
stealthJS = generateStealthJS()
}

return &BrowserTool{
cfg: cfg,
chromePath: chromePath,
stealthJS: stealthJS,
}, nil
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewBrowserTool's doc comment says it "attempts to connect to the CDP endpoint" and will error if the connection fails, but the implementation only discovers Chrome and defers the CDP connection to connectIfNeeded() on first Execute(). Update the comment (or eagerly connect) so callers have correct expectations about when failures surface.

Copilot uses AI. Check for mistakes.
js := fmt.Sprintf(`(function() {
var el = document.querySelector('[data-pcw-idx="%d"]');
if (!el) return JSON.stringify({error: 'Element [%d] not found. Run state to refresh indices.'});
el.scrollIntoView({block: 'center', behavior: 'instant'});
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scrollIntoView is called with behavior: 'instant', but ScrollBehavior only supports 'auto' and 'smooth'. In Chrome this can throw a TypeError, breaking click/type/fill actions. Use a valid behavior value (e.g. 'auto') or omit the option.

Suggested change
el.scrollIntoView({block: 'center', behavior: 'instant'});
el.scrollIntoView({block: 'center', behavior: 'auto'});

Copilot uses AI. Check for mistakes.
Comment on lines +582 to +599
if t.mediaStore != nil {
ref, err := t.mediaStore.Store(tmpFile, media.MediaMeta{
Filename: "screenshot.png",
ContentType: "image/png",
Source: "tool:browser.screenshot",
}, scope)
if err != nil {
logger.WarnCF("tool", "Failed to store screenshot",
map[string]any{"error": err.Error()})
// Fall through to inline path
} else {
os.Remove(tmpFile)
return &ToolResult{
ForLLM: "Screenshot captured and sent to user (PNG).",
ForUser: "Screenshot captured",
Media: []string{ref},
ResponseHandled: true,
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This screenshot path deletes tmpFile after storing it in MediaStore, but MediaStore.Store explicitly does not copy/move the file (it just records a mapping). Removing the file here will make the returned media ref unresolvable/broken later. Keep the file on disk and rely on MediaStore cleanup/release policies instead.

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +41
cfg config.BrowserToolConfig
cdp *CDPClient
cdpMu sync.Mutex // guards lazy cdp connection
chromePath string
stealthJS string
mediaStore media.MediaStore
history []pageVisit // browsing history, most recent last
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BrowserTool stores mutable per-session state (history slice, cdp pointer) on the tool instance without synchronizing access. Tool execution can be concurrent, and the codebase explicitly relies on immutable request context to avoid shared mutable state on singleton tools. Add a mutex (or another concurrency-safe state holder) around history updates and close/reset operations (and any other mutable fields).

Suggested change
cfg config.BrowserToolConfig
cdp *CDPClient
cdpMu sync.Mutex // guards lazy cdp connection
chromePath string
stealthJS string
mediaStore media.MediaStore
history []pageVisit // browsing history, most recent last
cfg config.BrowserToolConfig
stateMu sync.Mutex // guards mutable browser session state below, including cdp/history close/reset
cdp *CDPClient // guarded by stateMu
history []pageVisit // guarded by stateMu; browsing history, most recent last
cdpMu sync.Mutex // guards lazy cdp connection orchestration
chromePath string
stealthJS string
mediaStore media.MediaStore

Copilot uses AI. Check for mistakes.
Comment on lines +255 to +284
js := `(function() {
var selectors = 'a, button, input, select, textarea, [role="button"], [role="link"], [role="tab"], [onclick], [tabindex]:not([tabindex="-1"])';
var elements = document.querySelectorAll(selectors);
var result = [];
var idx = 0;
for (var i = 0; i < elements.length; i++) {
var el = elements[i];
var rect = el.getBoundingClientRect();
if (rect.width === 0 && rect.height === 0) continue;
if (getComputedStyle(el).visibility === 'hidden') continue;
if (getComputedStyle(el).display === 'none') continue;
var tag = el.tagName.toLowerCase();
var info = {
i: idx,
tag: tag,
text: (el.textContent || '').trim().slice(0, 80).replace(/\s+/g, ' ')
};
if (el.getAttribute('role')) info.role = el.getAttribute('role');
if (el.getAttribute('type')) info.type = el.getAttribute('type');
if (el.getAttribute('name')) info.name = el.getAttribute('name');
if (el.getAttribute('placeholder')) info.placeholder = el.getAttribute('placeholder');
if (el.value !== undefined && el.value !== '') info.value = String(el.value).slice(0, 80);
if (tag === 'a' && el.getAttribute('href')) info.href = el.getAttribute('href').slice(0, 120);
if (el.getAttribute('aria-label')) info.label = el.getAttribute('aria-label').slice(0, 80);
if (el.disabled) info.disabled = true;
// Store selector path for interaction
el.setAttribute('data-pcw-idx', String(idx));
result.push(info);
idx++;
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state() writes data-pcw-idx attributes but never clears previous indices from elements that are no longer considered interactive/visible. A later click/type can still find and act on a stale element by old index, even after telling the user to re-run state. Consider clearing existing [data-pcw-idx] attributes at the start of state(), and/or re-checking visibility in click/type/fill before interacting.

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +170
// Send sends a CDP command and waits for the response.
func (c *CDPClient) Send(method string, params map[string]any) (json.RawMessage, error) {
return c.SendWithTimeout(method, params, cdpSendTimeout)
}

// SendWithTimeout sends a CDP command with a custom timeout.
func (c *CDPClient) SendWithTimeout(method string, params map[string]any, timeout time.Duration) (json.RawMessage, error) {
if c.closed.Load() {
return nil, fmt.Errorf("CDP connection closed")
}

id := atomic.AddInt64(&c.idCount, 1)

// Build request message
msg := map[string]any{
"id": id,
"method": method,
}
if params != nil {
msg["params"] = params
}

data, err := json.Marshal(msg)
if err != nil {
return nil, fmt.Errorf("failed to marshal CDP message: %w", err)
}

// Register pending request
p := &cdpPending{
ch: make(chan cdpResponse, 1),
timer: time.NewTimer(timeout),
}

c.mu.Lock()
c.pending[id] = p
c.mu.Unlock()

// Clean up on return
defer func() {
p.timer.Stop()
c.mu.Lock()
delete(c.pending, id)
c.mu.Unlock()
}()

// Send
c.mu.Lock()
err = c.conn.WriteMessage(websocket.TextMessage, data)
c.mu.Unlock()
if err != nil {
return nil, fmt.Errorf("failed to send CDP message: %w", err)
}

// Wait for response or timeout
select {
case resp := <-p.ch:
return resp.Result, resp.Err
case <-p.timer.C:
return nil, fmt.Errorf("CDP command %q timed out after %v", method, timeout)
case <-c.done:
return nil, fmt.Errorf("CDP connection closed while waiting for %q", method)
}
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDPClient.Send/SendWithTimeout ignores the passed-in context from tool actions; it uses a fixed timer and doesn't set per-call read/write deadlines. As a result, BrowserTool's ctx timeout/cancellation may not be respected while waiting on CDP commands. Consider threading ctx through Send (or adding SendCtx) and using ctx.Deadline/Done to bound both the WebSocket write and the response wait.

Copilot uses AI. Check for mistakes.
type BrowserToolConfig struct {
ToolConfig `envPrefix:"PICOCLAW_TOOLS_BROWSER_"`
CDPEndpoint string `json:"cdp_endpoint" yaml:"cdp_endpoint,omitempty" env:"PICOCLAW_TOOLS_BROWSER_CDP_ENDPOINT"`
Timeout int `json:"timeout_seconds" yaml:"timeout_seconds,omitempty" env:"PICOCLAW_TOOLS_BROWSER_TIMEOUT"`
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BrowserToolConfig.Timeout is tagged as json/yaml "timeout_seconds" but its env var is PICOCLAW_TOOLS_BROWSER_TIMEOUT. This is inconsistent with other *_TIMEOUT_SECONDS env vars in this config and makes configuration harder to discover. Consider renaming the env tag to PICOCLAW_TOOLS_BROWSER_TIMEOUT_SECONDS (or aligning the field name/tag).

Suggested change
Timeout int `json:"timeout_seconds" yaml:"timeout_seconds,omitempty" env:"PICOCLAW_TOOLS_BROWSER_TIMEOUT"`
Timeout int `json:"timeout_seconds" yaml:"timeout_seconds,omitempty" env:"PICOCLAW_TOOLS_BROWSER_TIMEOUT_SECONDS"`

Copilot uses AI. Check for mistakes.
Comment on lines +89 to +107
func TestIntegration_Screenshot(t *testing.T) {
tool := setupBrowserTool(t)
ctx := context.Background()

tool.Execute(ctx, map[string]any{"action": "navigate", "url": "https://example.com"})
result := tool.Execute(ctx, map[string]any{"action": "screenshot"})
if result.IsError {
t.Fatalf("screenshot failed: %s", result.ForLLM)
}
t.Logf("Screenshot result: ForLLM=%s, ForUser=%s, Media=%v",
result.ForLLM, result.ForUser, result.Media)

// Verify temp file was created (even without MediaStore it should have existed briefly)
if !strings.Contains(result.ForLLM, "Screenshot captured") {
t.Error("unexpected screenshot result")
}

tool.Execute(ctx, map[string]any{"action": "close"})
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This integration test asserts result.ForLLM contains "Screenshot captured", but the screenshot implementation returns "Screenshot saved to ..." when no MediaStore is configured (which is the case in setupBrowserTool). This will fail under -tags integration unless a MediaStore is injected or the assertion is updated to match the actual behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +424 to +432
// Focus the element
focusJS := fmt.Sprintf(`(function() {
var el = document.querySelector('[data-pcw-idx="%d"]');
if (!el) return JSON.stringify({error: 'Element [%d] not found. Run state to refresh indices.'});
el.scrollIntoView({block: 'center', behavior: 'instant'});
el.focus();
return JSON.stringify({ok: true});
})();`, index, index)

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scrollIntoView is called with behavior: 'instant', but ScrollBehavior only supports 'auto' and 'smooth'. This can throw a TypeError and break typing/focus. Use a valid behavior value (e.g. 'auto') or omit the option.

Copilot uses AI. Check for mistakes.
Comment on lines +468 to +477
// Focus, select all existing text, then type new text
clearJS := fmt.Sprintf(`(function() {
var el = document.querySelector('[data-pcw-idx="%d"]');
if (!el) return JSON.stringify({error: 'Element [%d] not found.'});
el.scrollIntoView({block: 'center', behavior: 'instant'});
el.focus();
el.value = '';
el.dispatchEvent(new Event('input', {bubbles: true}));
return JSON.stringify({ok: true});
})();`, index, index)
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scrollIntoView is called with behavior: 'instant', but ScrollBehavior only supports 'auto' and 'smooth'. This can throw a TypeError and break fill/clear. Use a valid behavior value (e.g. 'auto') or omit the option.

Copilot uses AI. Check for mistakes.
Implement a BrowserTool that provides browser automation capabilities
through direct CDP WebSocket connection to Chrome/Chromium. This
addresses Issue sipeed#293 (Browser Automation) on the project roadmap.

Key features:
- Go-native CDP client using gorilla/websocket (zero new dependencies)
- 12 actions: navigate, state, click, type, fill, select, screenshot,
  get_text, scroll, keys, evaluate, close
- [N]-indexed DOM state extraction (borrowed from opencli) for
  LLM-friendly structured output at zero vision token cost
- 13 stealth anti-detection patches (navigator.webdriver, chrome stub,
  plugin spoofing, CDP stack trace cleanup, etc.)
- Cross-platform Chrome auto-detection (macOS/Linux/Windows + CHROME_PATH)
- Screenshot delivery via MediaStore + channel system to users
- Browsing history tracking with compact summary in state output
- SSRF protection using net.ParseIP for full CIDR coverage (private,
  loopback, link-local, metadata endpoints)
- Concurrent-safe lazy CDP connection with sync.Mutex
- Optional via //go:build cdp tag — zero impact on default builds

Build:
- Default: go build — no browser code included (stub returns error)
- Full:    go build -tags cdp — complete browser support enabled

New files:
- pkg/tools/browser.go              — BrowserTool core (Tool interface)
- pkg/tools/browser_cdp.go          — CDP WebSocket client
- pkg/tools/browser_cdp_util.go     — Chrome path detection
- pkg/tools/browser_stealth.go      — Anti-detection JS generator
- pkg/tools/browser_stub.go         — Stub for non-CDP builds
- pkg/tools/browser_test.go         — Unit tests (8 tests)
- pkg/tools/browser_integration_test.go — Integration tests (9 tests)

Modified files:
- pkg/config/config.go   — BrowserToolConfig + IsToolEnabled("browser")
- pkg/agent/instance.go  — Browser tool registration

Ref: sipeed#293
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: agent domain: tool go Pull requests that update go code type: enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants