feat(review): support multi-model code review with arbitration#2376
feat(review): support multi-model code review with arbitration#2376
Conversation
Add multi-model code review capability to `/review`. Multiple LLMs independently review the same diff in parallel, then an arbitrator model merges, deduplicates, and produces a unified report. Key changes: - Settings: add review.models and review.arbitratorModel config - ModelRegistry: add findModelById() for cross-authType model lookup - MultiModelReviewService: parallel generateContent() + arbitration - MultiModelReviewTool: orchestrates the two-phase pipeline - SKILL.md: new Step 2 tries multi-model before falling back to 4-agent Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📋 Review SummaryThis PR implements a well-architected multi-model code review system that allows users to configure multiple LLMs to independently review the same code diff, with an arbitration phase to merge findings. The implementation demonstrates solid architectural decisions with clean separation of concerns between the service, tool, and configuration layers. Overall, this is a thoughtful enhancement that extends the existing 🔍 General Feedback
🎯 Specific Feedback🔴 Critical
🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
…nd tests - Detect empty LLM responses as failures in both review and arbitration - Fix abort signal race with throwIfAborted() before async work - Add baseUrl validation for inline model configs (exempt qwen-oauth/vertex/gemini) - Surface arbitrator resolution/execution failures in user-facing output - Skip pointless arbitration when only 1 of N models succeeds - Improve ambiguous model ID error with example syntax - Extract duplicated formatModelReviews helper - Remove redundant issue-multi-model-review.md design doc - Add comprehensive tests for MultiModelReviewService, MultiModelReviewTool, and findModelById Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nd efficiency - Replace ModelsConfig.getModelRegistry() with findModelById() delegate to preserve encapsulation of the internal ModelRegistry - Conditionally register MultiModelReviewTool only when 2+ review models are configured, avoiding unnecessary tool in LLM context - Filter thought parts in extractResponseText to prevent extended thinking content from leaking into review text - Exclude diff from session arbitration prompt to avoid duplicating it in the session model's context window - Add duplicate model ID warnings in getReviewModels() - Add missing `import process from 'node:process'` - Fix import ordering, schema default, test comments - Generate updated settings.schema.json to fix CI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `thought` property comes from an index signature on the Part type, so `tsc --build` (with noPropertyAccessFromIndexSignature) requires bracket notation `['thought']` instead of dot notation `.thought`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
…ulti-model review UX - Fix VS Code settings schema to support both string and object items in review.models - Replace duplicated extractResponseText with existing getResponseText from partUtils - Register multi-model review tool when >= 1 model configured (was >= 2) so guidance text is reachable - Change arbitratorModel default from '' to undefined for cleaner serialization - Add diff size logging before dispatching to review models - Add TODO for centralizing API key resolution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The VS Code settings schema is auto-generated and should not be manually edited. Reverts manual oneOf change and uses the generated output instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d tool guidance - Add 15 tests for Config.getReviewModels() and getArbitratorModel(): covers string resolution, inline object validation (missing/invalid authType, missing baseUrl), deduplication, mixed entries, and qwen-oauth baseUrl exemption - Add 3 service tests: createContentGenerator failure, arbitrator API error propagation, and arbitration prompt content verification - Add 2 tool tests: zero-models guidance path and available models listing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| if ( | ||
| this.reviewConfig?.models && | ||
| Array.isArray(this.reviewConfig.models) && | ||
| this.reviewConfig.models.length > 0 |
There was a problem hiding this comment.
This gate still makes the documented zero-config /review --multi flow unreachable. When review.models is empty, the tool is not registered, so the skill jumps straight to the legacy single-model path and never shows the setup guidance described in the design. If Level 0 onboarding is still intended, the tool needs to be available even before any review models are configured.
There was a problem hiding this comment.
For this one, the current gate is intentional:
- 0 models → tool not registered → SKILL.md skips straight to Step 3 (single-model review)
- 1 model → tool registered → returns setup guidance → falls back to Step 3
- 2+ models → tool registered → multi-model review
The SKILL.md does not define a --multi flag (only --single), so there is no explicit opt-in path for zero-config users. Registering the tool unconditionally would add a redundant tool call on every /review for users who have never configured review.models — extra latency with no benefit.
If we want a zero-config onboarding flow in the future, I think the right approach would be to add --multi flag support in SKILL.md alongside unconditional registration, rather than just removing the gate. Happy to discuss if you think that's worth doing in this PR or a follow-up.
…t and schema union type
- SKILL.md: enumerate all non-final multi_model_review return paths (setup
guidance, all-models-failed, config error) with explicit fallback to Step 3;
treat single-model result as complete review to avoid redundant 4-agent pass
- settingsSchema.ts: add items with oneOf union (string | object) so VS Code
schema correctly validates inline model config objects
- generate-settings-schema.ts: respect custom items from source schema instead
of hardcoding { type: 'string' }; add oneOf/required to JsonSchemaProperty
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add failedModels field to CollectedReview so failure info flows from
service to tool layer with model IDs and error messages
- Show specific failure details (model + reason) in all partial/total
failure paths instead of only logging to debug
- Unify llmContent to always use [{ text }] array format across all
return paths for consistency
- Pass precomputed failureSummary to buildReportHeader to avoid
redundant formatFailureSummary calls
- Update design doc Open Questions to clarify MVP does not implement
context window detection (future work)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…arly
- Unify all llmContent return paths in MultiModelReviewTool to use
[{ text: ... }] array format (config error and guidance paths were
returning bare strings while all other paths used arrays)
- Add early validation in buildGeneratorConfig() to throw a clear error
when a model's envKey is set but the environment variable is missing
- Update 4 tool tests to match the new array format
- Add service test for missing env var error path
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Closes #2375
Configuration
Usage
Architecture
Phase 1: Parallel Collection Phase 2: Arbitration (always runs)
Key design decisions:
Changes
Test plan
TLDR
Dive Deeper
Reviewer Test Plan
Testing Matrix
Linked issues / bugs