Skip to content

feat(debug): add Debug Companion — AI-powered debugging for Gemini CLI#22472

Closed
SUNDRAM07 wants to merge 12 commits intogoogle-gemini:mainfrom
SUNDRAM07:feat/debug-companion-gsoc-draft
Closed

feat(debug): add Debug Companion — AI-powered debugging for Gemini CLI#22472
SUNDRAM07 wants to merge 12 commits intogoogle-gemini:mainfrom
SUNDRAM07:feat/debug-companion-gsoc-draft

Conversation

@SUNDRAM07
Copy link
Copy Markdown

Summary

This PR adds the Debug Companion — a production-grade, AI-powered debugging subsystem for Gemini CLI. It implements a proof-of-concept for Idea #7: Debug Companion, providing 9 debug tools, a full DAP (Debug Adapter Protocol) client, and 33 supporting modules spanning 7 architectural layers.

This is a Draft PR — it represents a proof-of-concept demonstrating the core architecture and capabilities. The remaining work (interactive /debug command, Ink TUI, integration tests with real debuggers) is planned for the GSoC period.

What does this PR do?

It enables Gemini CLI to debug programs by communicating with debug adapters (Node.js, Python, Go, Ruby) through the Debug Adapter Protocol. The LLM can launch debug sessions, set breakpoints, step through code, inspect variables, evaluate expressions, and analyze errors — all through natural language.

9 Debug Tools

Tool Purpose
debug_launch Start a debug session with auto-configured adapters
debug_attach Attach to a running process
debug_set_breakpoint Set line breakpoints with conditions and hit counts
debug_set_function_breakpoint Set breakpoints on function names
debug_step Step in/out/over/next with granular control
debug_evaluate Evaluate expressions in debug context
debug_get_stacktrace Retrieve enriched stack traces with source context
debug_get_variables Inspect variables with scope filtering
debug_disconnect Graceful session termination

Architecture: 33 Modules, 7 Layers

graph TB
    subgraph Protocol["Protocol Layer"]
        DAP["DAPClient — Wire Protocol + TCP"]
        SRC["SourceMapResolver"]
        REG["DebugAdapterRegistry"]
        CFG["DebugConfigPresets"]
    end

    subgraph Tools["Tool Layer — 9 Tools"]
        T["debug_launch / attach / breakpoint<br/>step / evaluate / stacktrace<br/>variables / disconnect / function_bp"]
    end

    subgraph Breakpoints["Breakpoint Layer"]
        BS["BreakpointStore"]
        SS["SmartSuggester — 4 strategies"]
        DB["DataBreakpointManager"]
        EB["ExceptionBreakpointManager"]
        BV["BreakpointValidator — Pre-validation"]
    end

    subgraph Analysis["Analysis Layer"]
        STA["StackTraceAnalyzer"]
        FIX["FixSuggestionEngine — 11 patterns"]
        KB["ErrorKnowledgeBase"]
        RCA["RootCauseAnalyzer"]
        EC["DebugErrorClassifier — 17 patterns"]
    end

    subgraph State["State Layer"]
        SM["SessionStateMachine — 8-state FSM"]
        SH["SessionHistory — Loop detection"]
        SER["SessionSerializer"]
        CSR["ConditionalStepRunner"]
    end

    subgraph Context["Context Layer"]
        CTX["DebugContextBuilder — Token-aware"]
        DP["DebugPrompt"]
        WM["WatchExpressionManager"]
        VDT["VariableDiffTracker"]
        TEL["TelemetryCollector"]
        PERF["PerformanceProfiler"]
    end

    subgraph Infra["Infrastructure Layer"]
        APM["AdapterProcessManager — 4 languages"]
        SAN["InputSanitizer"]
        PG["PolicyGuard"]
        TG["TestGenerator"]
        WO["WorkflowOrchestrator"]
        IFP["InlineFixPreview"]
    end

    Tools --> Protocol
    Tools --> Breakpoints
    Tools --> Analysis
    Tools --> State
    Context --> Tools
    Infra --> Protocol
Loading

Key Design Decisions

  1. Protocol-first architecture: The DAPClient implements the full DAP wire protocol with TCP transport, Content-Length framing, and request/response correlation — the same protocol used by VS Code.

  2. Pre-validation over post-failure: BreakpointValidator checks if a line is executable before sending to the adapter, preventing the common "breakpoint not verified" frustration.

  3. LLM-optimized context: DebugContextBuilder creates priority-ranked, token-budget-aware context for the LLM, ensuring the most relevant debug information is always available regardless of context window limits.

  4. Production resilience: DebugErrorClassifier transforms raw error strings into structured, actionable intelligence with 17 patterns across 8 categories — each with severity, recovery strategies, and retry logic.

  5. Root cause analysis: RootCauseAnalyzer goes beyond "what crashed" to answer "why" — generating ranked hypotheses with confidence scores and concrete debugging next steps.

Testing

  • 419 tests across 30 test files — all passing ✅
  • TypeScript compilation: Clean, zero errors ✅
  • Test coverage: Every module has a corresponding test file
Test Files  30 passed (30)
     Tests  419 passed (419)
  Duration  1.59s

What's Next (GSoC Timeline)

Phase Work Est. Weeks
1 /debug slash command & interactive debug mode 6-8
2 Debug Mode Ink TUI (variable panel, stack view, source preview) 4-6
3 Integration tests with real debuggers (Node.js, Python, Go) 3-4
4 Adapter auto-discovery & lifecycle management 3-4

Stats

Metric Value
Lines of Code 16,467
Modules 33
Tools 9
Tests 419
Source Files 31
Test Files 30
Languages Supported 4 (Node.js, Python, Go, Ruby)
Error Patterns 17

Related to #20674

…n and export

- PerformanceCollector: latency P50/P90/P99 percentiles, token efficiency,
  v8 heap utilization, startup phase analysis, optimization suggestions
- CostEstimator: per-model token cost tracking for Gemini 2.0/2.5/3,
  cache savings calculation, cheapest-model recommendations
- PerformanceExporter: JSON export (CI pipelines) and Markdown export
  (human-readable reports) with configurable sections
- 42 tests across 3 test files, all passing
- Exported from telemetry/index.ts

GSoC 2026 Idea google-gemini#5 proof-of-concept
Implements the foundation of the Debug Companion:
- DAPClient: Full DAP wire protocol with TCP transport, message framing,
  request/response correlation, and event handling (675 lines)
- SourceMapResolver: TypeScript source map resolution for accurate
  breakpoint placement (308 lines)
- DebugAdapterRegistry: Adapter configuration for Node.js, Python, Go,
  and Ruby debug adapters (183 lines)
- DebugConfigPresets: Pre-built launch configurations for common
  debugging scenarios (290 lines)

Part of google-gemini#20674
Adds 9 LLM-facing debug tools following the Gemini CLI tool architecture:
- debug_launch: Start debug sessions with auto-configured adapters
- debug_attach: Attach to running processes
- debug_set_breakpoint: Set line breakpoints with conditions
- debug_set_function_breakpoint: Set breakpoints on function names
- debug_step: Step in/out/over/next with granular control
- debug_evaluate: Evaluate expressions in debug context
- debug_get_stacktrace: Retrieve enriched stack traces
- debug_get_variables: Inspect variables with scope filtering
- debug_disconnect: Graceful session termination

Includes full tool definitions with JSON schemas for each tool.

Part of google-gemini#20674
Comprehensive breakpoint management with 5 specialized modules:
- BreakpointStore: Persistent storage with file/line indexing (178 lines)
- SmartBreakpointSuggester: 4 strategies for auto-suggesting breakpoints
  based on error patterns, hot paths, and entry points (253 lines)
- DataBreakpointManager: DAP watchpoints that break on data changes (225 lines)
- ExceptionBreakpointManager: Caught/uncaught exception breakpoints with
  condition support and exception history tracking (287 lines)
- BreakpointValidator: Pre-validates breakpoint locations before sending
  to the adapter — checks executability, suggests nearest valid line (411 lines)

Part of google-gemini#20674
Intelligent error analysis with 5 modules:
- StackTraceAnalyzer: Enriches stack frames with source context (365 lines)
- FixSuggestionEngine: 11 error pattern matchers with actionable fixes (529 lines)
- ErrorKnowledgeBase: Curated error patterns with examples (331 lines)
- RootCauseAnalyzer: Generates ranked root cause hypotheses from exceptions,
  detects infinite recursion, suggests debugging next steps (482 lines)
- DebugErrorClassifier: 17 error patterns across 8 categories with severity,
  recovery strategies, and retry logic (484 lines)

Part of google-gemini#20674
Robust session lifecycle management with 4 modules:
- DebugSessionStateMachine: 8-state FSM (Idle, Connecting, Initializing,
  Stopped, Running, Stepping, Disconnecting, Error) with validated
  transitions and timing analytics (263 lines)
- DebugSessionHistory: Step-by-step history with debug loop detection
  to prevent infinite step cycles (235 lines)
- DebugSessionSerializer: Save/load debug sessions for resumption
  across Gemini CLI restarts (238 lines)
- ConditionalStepRunner: Execute step sequences with conditions
  (e.g., step until variable changes) (252 lines)

Part of google-gemini#20674
LLM-optimized context generation and observability with 6 modules:
- DebugContextBuilder: Priority-ranked, token-budget-aware context builder
  that feeds optimal debug state to the LLM (375 lines)
- DebugPrompt: System prompt augmentation for debug-aware conversations (121 lines)
- WatchExpressionManager: Persistent watch expressions with evaluation
  history and markdown reporting (207 lines)
- VariableDiffTracker: Track variable changes between debug stops,
  detect nullifications and volatile variables (331 lines)
- DebugTelemetryCollector: Usage metrics and session analytics (246 lines)
- PerformanceProfiler: Operation timing and bottleneck detection (215 lines)

Part of google-gemini#20674
Production-grade infrastructure completing the Debug Companion:
- AdapterProcessManager: Spawn, monitor, and manage debug adapter
  processes for Node.js, Python, Go, Ruby (390 lines)
- DebugInputSanitizer: Validates and sanitizes all debug inputs
  to prevent injection attacks (335 lines)
- DebugPolicyGuard: Risk classification for debug operations,
  enforcing safety policies (331 lines)
- DebugTestGenerator: Auto-generates test cases from debug
  sessions for regression testing (294 lines)
- DebugWorkflowOrchestrator: Coordinates multi-step debug
  workflows with rollback support (290 lines)
- InlineFixPreview: Shows fix previews before applying changes (226 lines)
- Barrel exports (index.ts) for all 33 debug modules

Part of google-gemini#20674
…UIRING_NARROWING

- Split monolithic debugTools.ts (1,045 lines) into 9 per-tool files
  under tools/debug/ following the repo's one-file-per-tool convention
- Created shared session-manager.ts for singleton DAP client management
- Added barrel index.ts for clean imports
- Original debugTools.ts now re-exports from debug/ for backward compat
- Added DebugLaunchTool, DebugEvaluateTool, DebugAttachTool to
  TOOLS_REQUIRING_NARROWING for human-in-the-loop security
- Registered DebugAttachTool and DebugSetFunctionBreakpointTool in
  config.ts (were previously missing from tool registration)
…sconnect

- New top-level /debugger command for interactive debug companion
- Subcommands: launch <file>, attach <port>, status, disconnect
- Uses submit_prompt to delegate to the LLM agent which invokes
  the debug tools (debug_launch, debug_attach, etc.)
- Registered in BuiltinCommandLoader alongside other built-in commands
- Named /debugger to avoid conflict with existing nightly /debug subcommand
@SUNDRAM07 SUNDRAM07 force-pushed the feat/debug-companion-gsoc-draft branch from a9f85ba to fbbd519 Compare March 23, 2026 17:48
…command

- Add session-manager.test.ts: singleton lifecycle, formatting helpers,
  error result structure, intelligence layer singletons (32 tests)
- Add debug-tools.test.ts: mock DAPClient tests for all 8 tool wrappers
  covering: no session errors, DAP timeouts, empty responses, missing
  response keys, non-Error throws, frame index out of range, session
  cleanup on disconnect failure, terminateDebuggee variants (34 tests)
- Add debuggerCommand.test.ts: help text, all 4 subcommands, edge cases
  for undefined args, whitespace-only input, paths with spaces,
  non-numeric ports, extra trailing args (23 tests)
- Total new tests: 89 | Total project tests: 508
- Lifecycle server simulates realistic DAP adapter with events
- Full 15-step test: connect → initialize → setBreakpoints →
  launch → configurationDone → stopped(entry) → stackTrace →
  scopes → variables → evaluate → step(next/stepIn/stepOut) →
  continue → stopped(breakpoint) → disconnect
- Tests concurrent operations (3 parallel setBreakpoints)
- Tests server crash recovery (adapter killed mid-session)
- Tests post-disconnect operation rejection
- All 4 E2E tests pass in under 350ms
@gemini-cli
Copy link
Copy Markdown
Contributor

gemini-cli bot commented Mar 29, 2026

Hi there! Thank you for your interest in contributing to Gemini CLI.

To ensure we maintain high code quality and focus on our prioritized roadmap, we have updated our contribution policy (see Discussion #17383).

We only guarantee review and consideration of pull requests for issues that are explicitly labeled as 'help wanted'. All other community pull requests are subject to closure after 14 days if they do not align with our current focus areas. For this reason, we strongly recommend that contributors only submit pull requests against issues explicitly labeled as 'help-wanted'.

This pull request is being closed as it has been open for 14 days without a 'help wanted' designation. We encourage you to find and contribute to existing 'help wanted' issues in our backlog! Thank you for your understanding and for being part of our community!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant