Skip to content

[Browser Agent] Add Integration Tests for Browser Automation (Headless CI) #21120

@gsquared94

Description

@gsquared94

Summary

The browser agent lacks integration tests. All existing tests mock the MCP SDK, meaning the real flow — launching Chrome, connecting via chrome-devtools-mcp, executing tool calls, and cleaning up — is never verified. Integration tests should be added to the existing integration-tests/ directory using the TestRig harness, running in headless mode for CI compatibility.

Problem

Unit tests mock everything

The browser agent tests in browserManager.test.ts, browserAgentFactory.test.ts, and mcpToolWrapper.test.ts mock Client, StdioClientTransport, and the MCP SDK entirely:

vi.mock('@modelcontextprotocol/sdk/client/index.js', ...);

This means the following are never tested end-to-end:

  1. npx chrome-devtools-mcp@0.17.1 actually launching and responding to listTools
  2. take_snapshot returning a real accessibility tree from a live page
  3. navigate_page / new_page actually opening URLs in Chrome
  4. take_screenshot producing a real image
  5. Browser process cleanup not leaving orphaned Chrome instances
  6. The full invocation path: prompt → browser agent selection → MCP connection → tool calls → result

Existing integration test infrastructure

The project already has a mature integration test system that these tests should plug into:

Component Location Purpose
Test harness test-rig.ts TestRig class: spawns bundled CLI, isolates HOME/config, captures stdout/stderr/tool logs
Global setup globalSetup.ts Creates isolated test directories, sets GEMINI_CLI_INTEGRATION_TEST=true, downloads ripgrep
Vitest config vitest.config.ts 5-min timeout, retry=2, GEMINI_TEST_TYPE=integration
CI workflow chained_e2e.yml E2E jobs on Linux/Mac/Windows via npm run test:integration:sandbox:none
MCP precedent simple-mcp-server.test.ts Demonstrates MCP server integration testing within this harness

Proposed Solution

Test file: integration-tests/browser-agent.test.ts

Tests go in the existing integration-tests/ directory and are automatically picked up by vitest run --root ./integration-tests (run via npm run test:integration:sandbox:none).
Tests should use describe.skipIf(!chromeAvailable) to gracefully skip in environments without Chrome rather than failing.

import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { TestRig, assertModelHasOutput } from './test-helper.js';
import { join } from 'node:path';
const chromeAvailable = (() => {
  try {
    const { execSync } = require('node:child_process');
    if (process.platform === 'darwin') {
      execSync('test -d "/Applications/Google Chrome.app"', { stdio: 'ignore' });
    } else if (process.platform === 'linux') {
      execSync('which google-chrome || which chromium-browser || which chromium', { stdio: 'ignore' });
    } else {
      return false;
    }
    return true;
  } catch {
    return false;
  }
})();
describe.skipIf(!chromeAvailable)('browser-agent', () => {
  let rig: TestRig;
  beforeEach(() => {
    rig = new TestRig();
  });
  afterEach(async () => await rig.cleanup());
  it('should navigate to a page and report its content', async () => {
    await rig.setup('browser-navigate-and-snapshot', {
      fakeResponsesPath: join(__dirname, 'browser-agent.navigate-snapshot.responses'),
      settings: {
        agents: {
          browser: {
            headless: true,
            sessionMode: 'isolated',
          },
        },
      },
    });
    const result = await rig.run({
      args: 'Open https://example.com in the browser and tell me the page title.',
    });
    assertModelHasOutput(result);
    const toolLogs = rig.readToolLogs();
    const browserAgentCall = toolLogs.find(
      (t) => t.toolRequest.name === 'browser_agent'
    );
    expect(browserAgentCall, 'Expected browser_agent to be called').toBeDefined();
  });
  it('should clean up browser process after completion', async () => {
    await rig.setup('browser-cleanup', {
      fakeResponsesPath: join(__dirname, 'browser-agent.cleanup.responses'),
      settings: {
        agents: {
          browser: {
            headless: true,
            sessionMode: 'isolated',
          },
        },
      },
    });
    await rig.run({
      args: 'Open https://example.com in the browser.',
    });
    // Reaching here without timeout means cleanup succeeded
    await new Promise(resolve => setTimeout(resolve, 2000));
  });
});

Fake response files

Create browser-agent.*.responses files in integration-tests/ for deterministic model output, following the pattern of existing response files like hooks-system.*.responses and concurrency-limit.responses.

CI: Ensure Chrome availability in chained_e2e.yml

Chrome is typically pre-installed on GitHub-hosted runners, but an explicit check should be added to the e2e_linux job in chained_e2e.yml before the "Run E2E tests" step:

- name: 'Ensure Chrome is available'
  run: |
    if ! command -v google-chrome &> /dev/null; then
      sudo apt-get update
      sudo apt-get install -y google-chrome-stable
    fi
    google-chrome --version

The macOS runner (macos-latest) has Chrome pre-installed. Windows can be skipped initially.

Design Notes

  • sessionMode: 'isolated': Each test gets a clean temporary browser profile, preventing cross-contamination between tests.
  • headless: true: Required for CI runners that lack display servers. Also faster and more deterministic.
  • sandbox:none only: Browser automation requires Chrome to run as a separate process with CDP, which doesn't work inside Docker/Podman sandboxes. The sandbox:docker and sandbox:podman test modes would need chrome-devtools-mcp pre-installed in the sandbox image to work.
  • Fake responses: The TestRig spawns the full bundled CLI and uses --fake-responses to control model output. This is the right approach because it tests the entire stack (prompt → agent selection → MCP connection → automation → rendering) rather than directly instantiating BrowserManager, which is already covered by unit tests.

Affected Files

File Change
New: integration-tests/browser-agent.test.ts Integration tests using TestRig
New: integration-tests/browser-agent.*.responses Fake model response files
chained_e2e.yml Chrome availability check in e2e_linux job

Labels

browser-agent, testing, ci, enhancement

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/platformIssues related to Build infra, Release mgmt, Testing, Eval infra, Capacity, Quota mgmtstatus/need-triageIssues that need to be triaged by the triage automation.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions