feat(tool): add browser automation via Chrome DevTools Protocol (CDP) by Yourdaylight · Pull Request #2410 · sipeed/picoclaw

Yourdaylight · 2026-04-07T12:51:50Z

Summary

Implement a BrowserTool that provides browser automation capabilities through direct CDP (Chrome DevTools Protocol) WebSocket connection. This addresses Issue #293 (Browser Automation) from the project roadmap.

Key design choices:

Go-native CDP client using gorilla/websocket (already in go.mod — zero new dependencies)
Optional via //go:build cdp — default builds have zero browser code; go build -tags cdp enables full support
Borrowed [N]-indexed DOM state from opencli for LLM-friendly structured output at zero vision token cost
13 stealth anti-detection patches to avoid website bot detection

Features

Action	Description
`navigate`	Open URL, wait for load
`state`	Extract [N]-indexed interactive DOM elements (core action)
`click [N]`	Click element by index
`type [N] "text"`	Type into element
`fill [N] "text"`	Clear + type into element
`select [N] "option"`	Select dropdown option
`screenshot`	Capture PNG, deliver via MediaStore to user
`get_text [N]`	Extract element text/value
`scroll up/down`	Scroll page
`keys <key>`	Keyboard input (Enter, Tab, etc.)
`evaluate <js>`	Run JavaScript (opt-in via config)
`close`	Close browser session

Security

SSRF protection via net.ParseIP — blocks private/loopback/link-local/metadata IPs
JS evaluate disabled by default (allow_evaluate: false)
Concurrent-safe lazy CDP connection (sync.Mutex)
Screenshot temp files cleaned up properly
Safe JS string embedding via json.Marshal (not fmt.Sprintf %q)

Browsing History

Tracks visited pages and includes compact history summary in state output:

Browsing history (3 pages):
  1. Example Domain — https://example.com/
  2. Sign in to GitHub — https://github.com/login
> 3. Hacker News — https://news.ycombinator.com/

Current page: Hacker News
URL: https://news.ycombinator.com/

Interactive elements (25):
[0] a "Hacker News"
[1] a "new"
...

Files

New (7 files, +2230 lines):

pkg/tools/browser.go — BrowserTool core (Tool interface, 12 actions)
pkg/tools/browser_cdp.go — CDP WebSocket client
pkg/tools/browser_cdp_util.go — Chrome path auto-detection (macOS/Linux/Windows)
pkg/tools/browser_stealth.go — 13 anti-detection JS patches
pkg/tools/browser_stub.go — Stub for non-CDP builds
pkg/tools/browser_test.go — Unit tests (8 tests)
pkg/tools/browser_integration_test.go — Integration tests (9 tests)

Modified (2 files):

pkg/config/config.go — BrowserToolConfig struct + IsToolEnabled("browser")
pkg/agent/instance.go — Browser tool registration

Test plan

go build (default) compiles without browser code
go build -tags cdp compiles with full browser support
go vet passes for both build tag variants
Unit tests pass (8/8): Chrome detection, URL validation, stealth JS, params
Integration tests pass (9/9): navigate, state, screenshot, click, type, evaluate, stealth, scroll, SSRF blocking, temp file cleanup
E2E verified against Chrome 146 headless: example.com, GitHub login page, Hacker News
GitHub login form correctly detected: username input, password input, submit button
Stealth: navigator.webdriver === false confirmed
SSRF: localhost, 127.0.0.1, 10.x, 192.168.x, 172.16.x, [::1], 169.254.169.254 all blocked

Ref: #293

CLAassistant · 2026-04-07T12:51:59Z

All committers have signed the CLA.

Copilot

Pull request overview

Adds an optional (build-tagged) CDP-based browser automation tool to PicoClaw, enabling agents to navigate pages, extract structured DOM state, interact with elements, and capture screenshots, while integrating with existing tool registration and config.

Changes:

Introduces BrowserTool (CDP build tag) with actions like navigate/state/click/type/screenshot/evaluate plus stealth patches and browsing history.
Implements a lightweight CDP WebSocket client and Chrome executable discovery utilities.
Wires the tool into configuration (BrowserToolConfig, IsToolEnabled("browser")) and agent tool registration; adds unit + integration tests.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
pkg/tools/browser.go	Core BrowserTool implementation (actions, SSRF guard, history, screenshot/media integration).
pkg/tools/browser_cdp.go	CDP WebSocket client for sending commands and handling events.
pkg/tools/browser_cdp_util.go	Chrome/Chromium path discovery + launch arg helper.
pkg/tools/browser_stealth.go	Stealth JS patches injected on new documents.
pkg/tools/browser_stub.go	Non-CDP build stub returning “not compiled” errors.
pkg/tools/browser_test.go	Unit tests for URL validation, chrome detection, stealth JS, args helpers.
pkg/tools/browser_integration_test.go	CDP integration tests behind `integration` tag.
pkg/config/config.go	Adds `BrowserToolConfig` and enables `tools.browser` gating via `IsToolEnabled`.
pkg/agent/instance.go	Registers browser tool when enabled (warns if unavailable).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-07T12:58:51Z

pkg/tools/browser.go

+	// Store in media store if available; otherwise clean up temp file
+	var mediaRef string
+	if t.mediaStore != nil {
+		ref, err := t.mediaStore.Store(tmpFile, media.MediaMeta{
+			Filename:    "screenshot.png",
+			ContentType: "image/png",
+			Source:      "tool:browser.screenshot",
+		}, scope)
+		if err != nil {
+			logger.WarnCF("tool", "Failed to store screenshot in media store",
+				map[string]any{"error": err.Error()})
+			os.Remove(tmpFile) // clean up on store failure
+		} else {
+			mediaRef = ref
+		}
+	} else {
+		// No media store — remove temp file since nobody will clean it up
+		defer os.Remove(tmpFile)
+	}
+
+	// Return result with media delivery to user + vision support for LLM
+	result := &ToolResult{
+		ForLLM:          "Screenshot captured (PNG). LLM can analyze the visual content to understand current page state.",
+		ForUser:         "Screenshot captured and sent",
+		ResponseHandled: true, // Don't wait for LLM response, send immediately to user
+	}
+
+	if mediaRef != "" {
+		result.Media = []string{mediaRef}
+	}


screenshot returns ResponseHandled=true and tells the user the screenshot was sent even when mediaStore is nil or when mediaStore.Store fails (in both cases mediaRef stays empty). This makes the tool report success without delivering any media, and the deferred temp-file cleanup also removes the only copy. Consider returning an error when no MediaStore is configured (similar to send_file), or returning an artifact path/ref that can actually be delivered; also treat Store failures as tool errors rather than silent warnings.

Copilot · 2026-04-07T12:58:51Z

pkg/tools/browser.go

+func (t *BrowserTool) executeScroll(ctx context.Context, args map[string]any) *ToolResult {
+	direction, _ := args["direction"].(string)
+	if direction == "" {
+		// Also accept from "text" parameter for convenience
+		direction, _ = args["text"].(string)
+	}
+
+	amount := 500 // pixels
+	if direction == "up" {
+		amount = -500
+	}
+
+	js := fmt.Sprintf(`window.scrollBy(0, %d); JSON.stringify({scrollY: window.scrollY});`, amount)
+	raw, err := t.cdp.Evaluate(ctx, js)
+	if err != nil {
+		return ErrorResult(fmt.Sprintf("scroll failed: %v", err))
+	}
+
+	var resultStr string
+	json.Unmarshal(raw, &resultStr)
+
+	return SilentResult(fmt.Sprintf("Scrolled %s. Run 'state' to see updated elements.", direction))


scroll accepts any direction value (and even an empty one) because argument validation currently doesn't enforce the schema enum. For invalid/empty directions the code scrolls down by default but returns a misleading message (e.g. "Scrolled "). Add explicit validation (only "up"/"down") and return an error for anything else.

Copilot · 2026-04-07T12:58:52Z

pkg/tools/browser.go

+	var resultStr string
+	json.Unmarshal(raw, &resultStr)
+	var result struct {
+		OK       bool   `json:"ok"`
+		Error    string `json:"error"`
+		Selected string `json:"selected"`
+	}
+	json.Unmarshal([]byte(resultStr), &result)
+
+	if result.Error != "" {
+		return ErrorResult(result.Error)
+	}
+	return SilentResult(fmt.Sprintf("Selected %q in [%d].", result.Selected, index))
+}


Several actions ignore JSON unmarshal failures from CDP Evaluate results (e.g. select ignores both unmarshals). If parsing fails, the code can incorrectly report success with empty fields. Please handle json.Unmarshal errors and surface a tool error when the CDP/JS result isn't in the expected shape.

Copilot · 2026-04-07T12:58:52Z

pkg/tools/browser.go

+	var resultStr string
+	json.Unmarshal(raw, &resultStr)
+	var result struct {
+		Text  string `json:"text"`
+		Error string `json:"error"`
+	}
+	json.Unmarshal([]byte(resultStr), &result)
+
+	if result.Error != "" {
+		return ErrorResult(result.Error)
+	}
+	return SilentResult(fmt.Sprintf("[%d] text: %q", index, result.Text))
+}


get_text ignores errors from both json.Unmarshal calls. If CDP returns a non-string/non-JSON value, this can silently produce an empty resultStr/Text and mislead the agent. Handle unmarshal errors and return an ErrorResult when parsing fails.

Copilot · 2026-04-07T12:58:52Z

pkg/tools/browser.go

+// validateBrowserURL checks that a URL is safe to navigate to.
+// It blocks private networks, loopback, metadata endpoints, and non-HTTP schemes.
+func validateBrowserURL(urlStr string) error {
+	if urlStr == "" {
+		return fmt.Errorf("empty URL")
+	}
+
+	parsed, err := url.Parse(urlStr)
+	if err != nil {
+		return fmt.Errorf("invalid URL: %w", err)
+	}
+
+	if parsed.Scheme != "http" && parsed.Scheme != "https" {
+		return fmt.Errorf("only http/https URLs are allowed (got %s://)", parsed.Scheme)
+	}
+
+	if parsed.Host == "" {
+		return fmt.Errorf("missing host in URL")
+	}
+
+	hostname := parsed.Hostname()
+
+	// Block obvious private/local hosts (string check)
+	if hostname == "localhost" || hostname == "0.0.0.0" ||
+		hostname == "metadata.google.internal" {
+		return fmt.Errorf("navigation to %s is not allowed", hostname)
+	}
+
+	// Parse as IP to catch all representations (hex, octal, IPv4-mapped IPv6, etc.)
+	ip := net.ParseIP(hostname)
+	if ip != nil {
+		if ip.IsLoopback() || ip.IsPrivate() || ip.IsLinkLocalUnicast() ||
+			ip.IsLinkLocalMulticast() || ip.IsUnspecified() {
+			return fmt.Errorf("navigation to private/local IP %s is not allowed", hostname)
+		}
+		// Block cloud metadata IPs
+		if ip.Equal(net.ParseIP("169.254.169.254")) {
+			return fmt.Errorf("navigation to cloud metadata endpoint is not allowed")
+		}
+	}
+
+	return nil


The SSRF check only blocks literal IP hosts and a few hard-coded hostnames; it does not resolve DNS for normal hostnames. An attacker can supply a hostname that resolves to a private/loopback/metadata IP and bypass this check. To make the claimed SSRF protection real, resolve A/AAAA records (and ideally re-check redirects) and block if any resolved IP is private/loopback/link-local/unspecified/metadata.

Copilot · 2026-04-07T12:58:52Z

pkg/tools/browser_cdp.go

+// Close closes the WebSocket connection.
+func (c *CDPClient) Close() error {
+	if c.closed.Swap(true) {
+		return nil // already closed
+	}
+	close(c.done)
+
+	// Cancel all pending requests
+	c.mu.Lock()
+	for id, p := range c.pending {
+		p.ch <- cdpResponse{Err: fmt.Errorf("CDP connection closed")}
+		delete(c.pending, id)
+	}
+	c.mu.Unlock()
+
+	return c.conn.Close()
+}


Close() iterates pending requests and does a blocking send into each p.ch. If a request already has a response buffered (channel size 1) and Close() races with readLoop, this send can block indefinitely. Use a non-blocking send (select { case ch<-...: default: }), close the channel, or store cancellation state in the pending struct to avoid deadlock during shutdown.

Copilot · 2026-04-07T12:58:53Z

pkg/tools/browser_cdp.go

+// removeHandler removes a specific handler from the event listeners.
+// It uses a linear scan which is fine for the small number of handlers we use.
+func (c *CDPClient) removeHandler(event string, target func(json.RawMessage)) {
+	c.eventMu.Lock()
+	defer c.eventMu.Unlock()
+	handlers := c.events[event]
+	for i := range handlers {
+		// Direct pointer comparison for function values allocated in the same scope
+		if &handlers[i] == nil {
+			continue
+		}
+		// Use fmt.Sprintf as a last resort for identity check.
+		// This is acceptable since we register very few handlers.
+		if fmt.Sprintf("%p", handlers[i]) == fmt.Sprintf("%p", target) {
+			c.events[event] = append(handlers[:i], handlers[i+1:]...)
+			return
+		}
+	}
+}


removeHandler's identity check is brittle: &handlers[i] == nil is unreachable, and comparing fmt.Sprintf("%p", fn) for func values relies on formatting behavior rather than a stable identifier. If removal fails, handlers will leak and keep firing. Consider changing On to return a handler ID/token (or store uintptr from reflect.ValueOf(fn).Pointer() alongside the handler) and remove by that, or maintain per-event slices of structs {id, fn}.

pkg/tools/browser_stub.go

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 12 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T07:47:05Z

pkg/tools/browser.go

+// NewBrowserTool creates a new BrowserTool. It verifies that Chrome is available
+// and attempts to connect to the CDP endpoint. Returns an error if Chrome is
+// not found or the connection fails.
+func NewBrowserTool(cfg config.BrowserToolConfig) (*BrowserTool, error) {
+	chromePath, err := FindChromePath()
+	if err != nil {
+		return nil, fmt.Errorf(
+			"Chrome/Chromium required for browser tool but not found. "+
+				"Install Chrome and restart, or set CHROME_PATH env var. "+
+				"Error: %w", err)
+	}
+
+	logger.InfoCF("tool", "Chrome found for browser tool",
+		map[string]any{"path": chromePath})
+
+	var stealthJS string
+	if cfg.Stealth {
+		stealthJS = generateStealthJS()
+	}
+
+	return &BrowserTool{
+		cfg:        cfg,
+		chromePath: chromePath,
+		stealthJS:  stealthJS,
+	}, nil
+}


NewBrowserTool's doc comment says it "attempts to connect to the CDP endpoint" and will error if the connection fails, but the implementation only discovers Chrome and defers the CDP connection to connectIfNeeded() on first Execute(). Update the comment (or eagerly connect) so callers have correct expectations about when failures surface.

Copilot · 2026-04-08T07:47:05Z

pkg/tools/browser.go

+	js := fmt.Sprintf(`(function() {
+  var el = document.querySelector('[data-pcw-idx="%d"]');
+  if (!el) return JSON.stringify({error: 'Element [%d] not found. Run state to refresh indices.'});
+  el.scrollIntoView({block: 'center', behavior: 'instant'});


scrollIntoView is called with behavior: 'instant', but ScrollBehavior only supports 'auto' and 'smooth'. In Chrome this can throw a TypeError, breaking click/type/fill actions. Use a valid behavior value (e.g. 'auto') or omit the option.

Suggested change

el.scrollIntoView({block: 'center', behavior: 'instant'});

el.scrollIntoView({block: 'center', behavior: 'auto'});

Copilot · 2026-04-08T07:47:06Z

pkg/tools/browser.go

+	if t.mediaStore != nil {
+		ref, err := t.mediaStore.Store(tmpFile, media.MediaMeta{
+			Filename:    "screenshot.png",
+			ContentType: "image/png",
+			Source:      "tool:browser.screenshot",
+		}, scope)
+		if err != nil {
+			logger.WarnCF("tool", "Failed to store screenshot",
+				map[string]any{"error": err.Error()})
+			// Fall through to inline path
+		} else {
+			os.Remove(tmpFile)
+			return &ToolResult{
+				ForLLM:          "Screenshot captured and sent to user (PNG).",
+				ForUser:         "Screenshot captured",
+				Media:           []string{ref},
+				ResponseHandled: true,
+			}


This screenshot path deletes tmpFile after storing it in MediaStore, but MediaStore.Store explicitly does not copy/move the file (it just records a mapping). Removing the file here will make the returned media ref unresolvable/broken later. Keep the file on disk and rely on MediaStore cleanup/release policies instead.

Copilot · 2026-04-08T07:47:06Z

pkg/tools/browser.go

+	cfg        config.BrowserToolConfig
+	cdp        *CDPClient
+	cdpMu      sync.Mutex // guards lazy cdp connection
+	chromePath string
+	stealthJS  string
+	mediaStore media.MediaStore
+	history    []pageVisit // browsing history, most recent last


BrowserTool stores mutable per-session state (history slice, cdp pointer) on the tool instance without synchronizing access. Tool execution can be concurrent, and the codebase explicitly relies on immutable request context to avoid shared mutable state on singleton tools. Add a mutex (or another concurrency-safe state holder) around history updates and close/reset operations (and any other mutable fields).

Suggested change

cfg config.BrowserToolConfig

cdp *CDPClient

cdpMu sync.Mutex // guards lazy cdp connection

chromePath string

stealthJS string

mediaStore media.MediaStore

history []pageVisit // browsing history, most recent last

cfg config.BrowserToolConfig

stateMu sync.Mutex // guards mutable browser session state below, including cdp/history close/reset

cdp *CDPClient // guarded by stateMu

history []pageVisit // guarded by stateMu; browsing history, most recent last

cdpMu sync.Mutex // guards lazy cdp connection orchestration

chromePath string

stealthJS string

mediaStore media.MediaStore

Copilot · 2026-04-08T07:47:06Z

pkg/tools/browser.go

+	js := `(function() {
+  var selectors = 'a, button, input, select, textarea, [role="button"], [role="link"], [role="tab"], [onclick], [tabindex]:not([tabindex="-1"])';
+  var elements = document.querySelectorAll(selectors);
+  var result = [];
+  var idx = 0;
+  for (var i = 0; i < elements.length; i++) {
+    var el = elements[i];
+    var rect = el.getBoundingClientRect();
+    if (rect.width === 0 && rect.height === 0) continue;
+    if (getComputedStyle(el).visibility === 'hidden') continue;
+    if (getComputedStyle(el).display === 'none') continue;
+    var tag = el.tagName.toLowerCase();
+    var info = {
+      i: idx,
+      tag: tag,
+      text: (el.textContent || '').trim().slice(0, 80).replace(/\s+/g, ' ')
+    };
+    if (el.getAttribute('role')) info.role = el.getAttribute('role');
+    if (el.getAttribute('type')) info.type = el.getAttribute('type');
+    if (el.getAttribute('name')) info.name = el.getAttribute('name');
+    if (el.getAttribute('placeholder')) info.placeholder = el.getAttribute('placeholder');
+    if (el.value !== undefined && el.value !== '') info.value = String(el.value).slice(0, 80);
+    if (tag === 'a' && el.getAttribute('href')) info.href = el.getAttribute('href').slice(0, 120);
+    if (el.getAttribute('aria-label')) info.label = el.getAttribute('aria-label').slice(0, 80);
+    if (el.disabled) info.disabled = true;
+    // Store selector path for interaction
+    el.setAttribute('data-pcw-idx', String(idx));
+    result.push(info);
+    idx++;
+  }


state() writes data-pcw-idx attributes but never clears previous indices from elements that are no longer considered interactive/visible. A later click/type can still find and act on a stale element by old index, even after telling the user to re-run state. Consider clearing existing [data-pcw-idx] attributes at the start of state(), and/or re-checking visibility in click/type/fill before interacting.

Copilot · 2026-04-08T07:47:06Z

pkg/tools/browser_cdp.go

+// Send sends a CDP command and waits for the response.
+func (c *CDPClient) Send(method string, params map[string]any) (json.RawMessage, error) {
+	return c.SendWithTimeout(method, params, cdpSendTimeout)
+}
+
+// SendWithTimeout sends a CDP command with a custom timeout.
+func (c *CDPClient) SendWithTimeout(method string, params map[string]any, timeout time.Duration) (json.RawMessage, error) {
+	if c.closed.Load() {
+		return nil, fmt.Errorf("CDP connection closed")
+	}
+
+	id := atomic.AddInt64(&c.idCount, 1)
+
+	// Build request message
+	msg := map[string]any{
+		"id":     id,
+		"method": method,
+	}
+	if params != nil {
+		msg["params"] = params
+	}
+
+	data, err := json.Marshal(msg)
+	if err != nil {
+		return nil, fmt.Errorf("failed to marshal CDP message: %w", err)
+	}
+
+	// Register pending request
+	p := &cdpPending{
+		ch:    make(chan cdpResponse, 1),
+		timer: time.NewTimer(timeout),
+	}
+
+	c.mu.Lock()
+	c.pending[id] = p
+	c.mu.Unlock()
+
+	// Clean up on return
+	defer func() {
+		p.timer.Stop()
+		c.mu.Lock()
+		delete(c.pending, id)
+		c.mu.Unlock()
+	}()
+
+	// Send
+	c.mu.Lock()
+	err = c.conn.WriteMessage(websocket.TextMessage, data)
+	c.mu.Unlock()
+	if err != nil {
+		return nil, fmt.Errorf("failed to send CDP message: %w", err)
+	}
+
+	// Wait for response or timeout
+	select {
+	case resp := <-p.ch:
+		return resp.Result, resp.Err
+	case <-p.timer.C:
+		return nil, fmt.Errorf("CDP command %q timed out after %v", method, timeout)
+	case <-c.done:
+		return nil, fmt.Errorf("CDP connection closed while waiting for %q", method)
+	}
+}


CDPClient.Send/SendWithTimeout ignores the passed-in context from tool actions; it uses a fixed timer and doesn't set per-call read/write deadlines. As a result, BrowserTool's ctx timeout/cancellation may not be respected while waiting on CDP commands. Consider threading ctx through Send (or adding SendCtx) and using ctx.Deadline/Done to bound both the WebSocket write and the response wait.

Copilot · 2026-04-08T07:47:07Z

pkg/config/config.go

+type BrowserToolConfig struct {
+	ToolConfig  `envPrefix:"PICOCLAW_TOOLS_BROWSER_"`
+	CDPEndpoint string `json:"cdp_endpoint"    yaml:"cdp_endpoint,omitempty"  env:"PICOCLAW_TOOLS_BROWSER_CDP_ENDPOINT"`
+	Timeout     int    `json:"timeout_seconds"  yaml:"timeout_seconds,omitempty" env:"PICOCLAW_TOOLS_BROWSER_TIMEOUT"`


BrowserToolConfig.Timeout is tagged as json/yaml "timeout_seconds" but its env var is PICOCLAW_TOOLS_BROWSER_TIMEOUT. This is inconsistent with other *_TIMEOUT_SECONDS env vars in this config and makes configuration harder to discover. Consider renaming the env tag to PICOCLAW_TOOLS_BROWSER_TIMEOUT_SECONDS (or aligning the field name/tag).

Suggested change

Timeout int `json:"timeout_seconds" yaml:"timeout_seconds,omitempty" env:"PICOCLAW_TOOLS_BROWSER_TIMEOUT"`

Timeout int `json:"timeout_seconds" yaml:"timeout_seconds,omitempty" env:"PICOCLAW_TOOLS_BROWSER_TIMEOUT_SECONDS"`

Copilot · 2026-04-08T07:47:07Z

pkg/tools/browser_integration_test.go

+func TestIntegration_Screenshot(t *testing.T) {
+	tool := setupBrowserTool(t)
+	ctx := context.Background()
+
+	tool.Execute(ctx, map[string]any{"action": "navigate", "url": "https://example.com"})
+	result := tool.Execute(ctx, map[string]any{"action": "screenshot"})
+	if result.IsError {
+		t.Fatalf("screenshot failed: %s", result.ForLLM)
+	}
+	t.Logf("Screenshot result: ForLLM=%s, ForUser=%s, Media=%v",
+		result.ForLLM, result.ForUser, result.Media)
+
+	// Verify temp file was created (even without MediaStore it should have existed briefly)
+	if !strings.Contains(result.ForLLM, "Screenshot captured") {
+		t.Error("unexpected screenshot result")
+	}
+
+	tool.Execute(ctx, map[string]any{"action": "close"})
+}


This integration test asserts result.ForLLM contains "Screenshot captured", but the screenshot implementation returns "Screenshot saved to ..." when no MediaStore is configured (which is the case in setupBrowserTool). This will fail under -tags integration unless a MediaStore is injected or the assertion is updated to match the actual behavior.

Copilot · 2026-04-08T07:47:07Z

pkg/tools/browser.go

+	// Focus the element
+	focusJS := fmt.Sprintf(`(function() {
+  var el = document.querySelector('[data-pcw-idx="%d"]');
+  if (!el) return JSON.stringify({error: 'Element [%d] not found. Run state to refresh indices.'});
+  el.scrollIntoView({block: 'center', behavior: 'instant'});
+  el.focus();
+  return JSON.stringify({ok: true});
+})();`, index, index)
+


scrollIntoView is called with behavior: 'instant', but ScrollBehavior only supports 'auto' and 'smooth'. This can throw a TypeError and break typing/focus. Use a valid behavior value (e.g. 'auto') or omit the option.

Copilot · 2026-04-08T07:47:07Z

pkg/tools/browser.go

+	// Focus, select all existing text, then type new text
+	clearJS := fmt.Sprintf(`(function() {
+  var el = document.querySelector('[data-pcw-idx="%d"]');
+  if (!el) return JSON.stringify({error: 'Element [%d] not found.'});
+  el.scrollIntoView({block: 'center', behavior: 'instant'});
+  el.focus();
+  el.value = '';
+  el.dispatchEvent(new Event('input', {bubbles: true}));
+  return JSON.stringify({ok: true});
+})();`, index, index)


scrollIntoView is called with behavior: 'instant', but ScrollBehavior only supports 'auto' and 'smooth'. This can throw a TypeError and break fill/clear. Use a valid behavior value (e.g. 'auto') or omit the option.

Implement a BrowserTool that provides browser automation capabilities through direct CDP WebSocket connection to Chrome/Chromium. This addresses Issue sipeed#293 (Browser Automation) on the project roadmap. Key features: - Go-native CDP client using gorilla/websocket (zero new dependencies) - 12 actions: navigate, state, click, type, fill, select, screenshot, get_text, scroll, keys, evaluate, close - [N]-indexed DOM state extraction (borrowed from opencli) for LLM-friendly structured output at zero vision token cost - 13 stealth anti-detection patches (navigator.webdriver, chrome stub, plugin spoofing, CDP stack trace cleanup, etc.) - Cross-platform Chrome auto-detection (macOS/Linux/Windows + CHROME_PATH) - Screenshot delivery via MediaStore + channel system to users - Browsing history tracking with compact summary in state output - SSRF protection using net.ParseIP for full CIDR coverage (private, loopback, link-local, metadata endpoints) - Concurrent-safe lazy CDP connection with sync.Mutex - Optional via //go:build cdp tag — zero impact on default builds Build: - Default: go build — no browser code included (stub returns error) - Full: go build -tags cdp — complete browser support enabled New files: - pkg/tools/browser.go — BrowserTool core (Tool interface) - pkg/tools/browser_cdp.go — CDP WebSocket client - pkg/tools/browser_cdp_util.go — Chrome path detection - pkg/tools/browser_stealth.go — Anti-detection JS generator - pkg/tools/browser_stub.go — Stub for non-CDP builds - pkg/tools/browser_test.go — Unit tests (8 tests) - pkg/tools/browser_integration_test.go — Integration tests (9 tests) Modified files: - pkg/config/config.go — BrowserToolConfig + IsToolEnabled("browser") - pkg/agent/instance.go — Browser tool registration Ref: sipeed#293

Copilot AI review requested due to automatic review settings April 7, 2026 12:51

Copilot started reviewing on behalf of Yourdaylight April 7, 2026 12:52 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Yourdaylight force-pushed the feat/browser-cdp-tool branch from 353b9f7 to e66eccb Compare April 7, 2026 13:12

sipeed-bot bot added type: enhancement New feature or request domain: tool domain: agent go Pull requests that update go code labels Apr 7, 2026

github-actions bot mentioned this pull request Apr 8, 2026

🦞 OpenClaw 生态日报 2026-04-08 gsscsd/big_model_radar#153

Open

Copilot AI review requested due to automatic review settings April 8, 2026 07:39

Yourdaylight force-pushed the feat/browser-cdp-tool branch from e66eccb to ac969ec Compare April 8, 2026 07:39

Copilot started reviewing on behalf of Yourdaylight April 8, 2026 07:40 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Yourdaylight force-pushed the feat/browser-cdp-tool branch from ac969ec to 9d2a5bc Compare April 8, 2026 08:34

github-actions bot mentioned this pull request Apr 9, 2026

🦞 OpenClaw 生态日报 2026-04-09 gsscsd/big_model_radar#158

Open

	el.scrollIntoView({block: 'center', behavior: 'instant'});
	el.scrollIntoView({block: 'center', behavior: 'auto'});

	Timeout int `json:"timeout_seconds" yaml:"timeout_seconds,omitempty" env:"PICOCLAW_TOOLS_BROWSER_TIMEOUT"`
	Timeout int `json:"timeout_seconds" yaml:"timeout_seconds,omitempty" env:"PICOCLAW_TOOLS_BROWSER_TIMEOUT_SECONDS"`

Conversation

Yourdaylight commented Apr 7, 2026

Summary

Features

Security

Browsing History

Files

Test plan

Uh oh!

CLAassistant commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

CLAassistant commented Apr 7, 2026 •

edited

Loading