-
Notifications
You must be signed in to change notification settings - Fork 12.7k
Description
Summary
The browser agent lacks integration tests. All existing tests mock the MCP SDK, meaning the real flow — launching Chrome, connecting via chrome-devtools-mcp, executing tool calls, and cleaning up — is never verified. Integration tests should be added to the existing integration-tests/ directory using the TestRig harness, running in headless mode for CI compatibility.
Problem
Unit tests mock everything
The browser agent tests in browserManager.test.ts, browserAgentFactory.test.ts, and mcpToolWrapper.test.ts mock Client, StdioClientTransport, and the MCP SDK entirely:
vi.mock('@modelcontextprotocol/sdk/client/index.js', ...);This means the following are never tested end-to-end:
npx chrome-devtools-mcp@0.17.1actually launching and responding tolistToolstake_snapshotreturning a real accessibility tree from a live pagenavigate_page/new_pageactually opening URLs in Chrometake_screenshotproducing a real image- Browser process cleanup not leaving orphaned Chrome instances
- The full invocation path: prompt → browser agent selection → MCP connection → tool calls → result
Existing integration test infrastructure
The project already has a mature integration test system that these tests should plug into:
| Component | Location | Purpose |
|---|---|---|
| Test harness | test-rig.ts | TestRig class: spawns bundled CLI, isolates HOME/config, captures stdout/stderr/tool logs |
| Global setup | globalSetup.ts | Creates isolated test directories, sets GEMINI_CLI_INTEGRATION_TEST=true, downloads ripgrep |
| Vitest config | vitest.config.ts | 5-min timeout, retry=2, GEMINI_TEST_TYPE=integration |
| CI workflow | chained_e2e.yml | E2E jobs on Linux/Mac/Windows via npm run test:integration:sandbox:none |
| MCP precedent | simple-mcp-server.test.ts | Demonstrates MCP server integration testing within this harness |
Proposed Solution
Test file: integration-tests/browser-agent.test.ts
Tests go in the existing integration-tests/ directory and are automatically picked up by vitest run --root ./integration-tests (run via npm run test:integration:sandbox:none).
Tests should use describe.skipIf(!chromeAvailable) to gracefully skip in environments without Chrome rather than failing.
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { TestRig, assertModelHasOutput } from './test-helper.js';
import { join } from 'node:path';
const chromeAvailable = (() => {
try {
const { execSync } = require('node:child_process');
if (process.platform === 'darwin') {
execSync('test -d "/Applications/Google Chrome.app"', { stdio: 'ignore' });
} else if (process.platform === 'linux') {
execSync('which google-chrome || which chromium-browser || which chromium', { stdio: 'ignore' });
} else {
return false;
}
return true;
} catch {
return false;
}
})();
describe.skipIf(!chromeAvailable)('browser-agent', () => {
let rig: TestRig;
beforeEach(() => {
rig = new TestRig();
});
afterEach(async () => await rig.cleanup());
it('should navigate to a page and report its content', async () => {
await rig.setup('browser-navigate-and-snapshot', {
fakeResponsesPath: join(__dirname, 'browser-agent.navigate-snapshot.responses'),
settings: {
agents: {
browser: {
headless: true,
sessionMode: 'isolated',
},
},
},
});
const result = await rig.run({
args: 'Open https://example.com in the browser and tell me the page title.',
});
assertModelHasOutput(result);
const toolLogs = rig.readToolLogs();
const browserAgentCall = toolLogs.find(
(t) => t.toolRequest.name === 'browser_agent'
);
expect(browserAgentCall, 'Expected browser_agent to be called').toBeDefined();
});
it('should clean up browser process after completion', async () => {
await rig.setup('browser-cleanup', {
fakeResponsesPath: join(__dirname, 'browser-agent.cleanup.responses'),
settings: {
agents: {
browser: {
headless: true,
sessionMode: 'isolated',
},
},
},
});
await rig.run({
args: 'Open https://example.com in the browser.',
});
// Reaching here without timeout means cleanup succeeded
await new Promise(resolve => setTimeout(resolve, 2000));
});
});Fake response files
Create browser-agent.*.responses files in integration-tests/ for deterministic model output, following the pattern of existing response files like hooks-system.*.responses and concurrency-limit.responses.
CI: Ensure Chrome availability in chained_e2e.yml
Chrome is typically pre-installed on GitHub-hosted runners, but an explicit check should be added to the e2e_linux job in chained_e2e.yml before the "Run E2E tests" step:
- name: 'Ensure Chrome is available'
run: |
if ! command -v google-chrome &> /dev/null; then
sudo apt-get update
sudo apt-get install -y google-chrome-stable
fi
google-chrome --versionThe macOS runner (macos-latest) has Chrome pre-installed. Windows can be skipped initially.
Design Notes
sessionMode: 'isolated': Each test gets a clean temporary browser profile, preventing cross-contamination between tests.headless: true: Required for CI runners that lack display servers. Also faster and more deterministic.sandbox:noneonly: Browser automation requires Chrome to run as a separate process with CDP, which doesn't work inside Docker/Podman sandboxes. Thesandbox:dockerandsandbox:podmantest modes would needchrome-devtools-mcppre-installed in the sandbox image to work.- Fake responses: The
TestRigspawns the full bundled CLI and uses--fake-responsesto control model output. This is the right approach because it tests the entire stack (prompt → agent selection → MCP connection → automation → rendering) rather than directly instantiatingBrowserManager, which is already covered by unit tests.
Affected Files
| File | Change |
|---|---|
New: integration-tests/browser-agent.test.ts |
Integration tests using TestRig |
New: integration-tests/browser-agent.*.responses |
Fake model response files |
| chained_e2e.yml | Chrome availability check in e2e_linux job |
Labels
browser-agent, testing, ci, enhancement