feat(arena): Add agent collaboration arena with multi-model competitive execution#1912
Draft
tanzhenxin wants to merge 13 commits intomainfrom
Draft
feat(arena): Add agent collaboration arena with multi-model competitive execution#1912tanzhenxin wants to merge 13 commits intomainfrom
tanzhenxin wants to merge 13 commits intomainfrom
Conversation
Introduces a new Arena system for running multiple AI agents in parallel terminal sessions with support for iTerm and Tmux backends. Core: - Add ArenaManager and ArenaAgentClient for orchestrating multi-agent sessions - Add terminal backends (ITermBackend, TmuxBackend) with feature detection - Add git worktree service for isolated agent workspaces - Add arena event system for real-time status updates CLI: - Add /arena command with start, stop, status, and select subcommands - Add Arena dialogs (Select, Start, Status, Stop) - Add ArenaCards component for displaying parallel agent outputs - Consolidate message components into StatusMessages and ConversationMessages - Add MultiSelect component for agent selection Config: - Add arena-related settings to schema and config Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…dates - Replace SESSION_WARNING with SESSION_UPDATE supporting info/warning types - Emit setup progress messages from ArenaManager during agent initialization - Record all arena UI events to session JSONL for chat history replay - Clean up unused agent event types (stream, tool calls, stats) - Update arena select/stop dialogs to record their output Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Rename SubAgentScope → AgentHeadless and runNonInteractive → execute - Move agents-collab/ into agents/ with new runtime/ subdirectory - Split subagent.ts into agent-core.ts and agent-headless.ts - Update all event types, emitters, and statistics classes BREAKING CHANGE: SubAgentScope renamed to AgentHeadless; runNonInteractive() renamed to execute() Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> Add InProcessBackend to run subagents in-process rather than via subprocess, enabling faster initialization and better resource management for agent collaboration arenas. Key changes: - Add InProcessBackend with sandboxed in-process agent execution - Refactor agent runtime into headless vs interactive modes - Add AsyncMessageQueue utility for agent message passing - Update ArenaManager with backend selection (in-process vs subprocess) - Refactor subagent types/exports; consolidate in subagents/types - Remove deprecated agent-hooks.ts (functionality merged into runtime) - Update task tool to support new agent lifecycle Breaking: Subagent type exports restructured; import from subagents/types
…ss arena mode Add AgentViewContext, AgentTabBar, and AgentChatView components for tab-based agent switching. Add useArenaInProcess hook bridging ArenaManager events to React state. Add agentHistoryAdapter converting AgentMessage[] to HistoryItem[]. Core support changes: - Replace stream buffers with ROUND_TEXT events (complete round text) - Add TOOL_OUTPUT_UPDATE events for live tool output streaming - Add pendingApprovals/liveOutputs/shellPids state to AgentInteractive - Fix missing ROUND_END emission for final text rounds Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
… reusable service Rename ArenaWorktreeConfig → WorktreeSetupConfig, setupArenaWorktrees → setupWorktrees, cleanupArenaSession → cleanupSession, etc. Change default storage path from ~/.qwen/arena/ to ~/.qwen/worktrees/ and branch prefix from arena/ to worktrees/. Add branchPrefix and metadata options for flexibility. Remove auto-repo-init behavior; fail fast instead. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Contributor
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
This was referenced Mar 1, 2026
- Add arena_session_started, arena_agent_completed, arena_session_ended events - Implement ArenaManager telemetry hooks with lifecycle tracking and metrics - Update AgentStatistics to support API-provided totalTokenCount and remove estimatedCost - Pass agent session IDs for telemetry correlation in PTY mode This enables detailed observability into arena performance, agent completion rates, and model comparison outcomes. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Add retry logic with exponential backoff for file renames that fail with EPERM/EACCES on Windows during concurrent operations. Fix test to use path.join() for cross-platform compatibility. This improves reliability of arena agent collaboration on Windows. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
This was referenced Mar 3, 2026
This was referenced Mar 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TLDR
Arena runs the same task across multiple AI models in parallel (isolated via git worktrees), letting users compare results side-by-side and merge the best solution back into their workspace.
Motivation
Users often configure multiple model providers but remain uncertain which is best suited for specific task types. Arena enables horizontal comparison by running the same task across different models in isolated environments, with side-by-side progress visibility and quantitative result comparison.
Design Overview
Multi-Agent Collaboration Infrastructure
Arena is the first consumer of a shared multi-agent platform. The platform is designed to be extensible to other modes (Team, Swarm) in the future, but only Arena is included in this PR. The platform provides a layered agent runtime:
AgentCore — The stateless execution engine housing
runReasoningLoop, responsible for streaming model responses, collecting function calls, executing tools viaCoreToolScheduler, and accumulating statistics. It owns no lifecycle logic; callers control when to start and stop.AgentHeadless — Wraps AgentCore for one-off executions. Receives a task string, runs the reasoning loop, returns a result string, then destroys. Used by Subagents (and future Swarm mode). Lifecycle: born →
execute()→ die.AgentInteractive — Wraps AgentCore for persistent interaction. Maintains an input queue via
AsyncMessageQueue, runs a continuous event loop, and builds UI-readable session state (message history, pending approvals, live outputs, shell PIDs). Used by Arena (and future Team mode).This layering means Subagents and multi-agent setups are not separate systems—adding a new multi-agent pattern only requires selecting the appropriate wrapper and implementing higher-level orchestration.
Display Backends
The platform provides two fundamentally different operational paradigms:
In-Process Mode runs all agents asynchronously within the current Node process, driven by
AgentInteractiveinstances. The UI provides switchable tab views viaAgentTabBar+AgentChatView. Switching merely swaps the data source being rendered; no process switching occurs. Rendering reuses the main agent'sHistoryItemDisplaypipeline, adaptingAgentMessage[]toHistoryItem[]viaagentHistoryAdapter. This mode provides live stats, lower initialization overhead, and seamless interaction.Split-Pane Mode launches each agent as a standalone CLI process in separate terminal panes. The primary implementation is
TmuxBackend, which uses current pane splits if running inside tmux, or creates a standalone tmux server if not (avoiding disruption to the user's existing session). This provides maximum isolation and works well for long-running tasks.An
ITermBackendalso exists but is currently disabled due to stability issues; it is not selectable via configuration.Backend selection defaults to tmux-first with graceful degradation to in-process when tmux is unavailable.
Environment Isolation
Arena uses
git worktreeto create isolated working directories at~/.qwen/arena/<session-id>/worktrees/<model-name>/. When created, each worktree identically mirrors the dirty state of the main repository (staged changes, unstaged changes, untracked files). This ensures every agent starts from the exact same baseline as the user's current working directory.Upon completion, diffs are generated against the baseline via
getAgentDiff(). Once the user selects a winner,applyAgentResult()merges the changes back into the main repository. During session cleanup, all worktrees and temporary branches are removed.Per-Agent Config Isolation
createPerAgentConfig()uses prototypal delegation (Object.create(base)) to generate per-agent Config instances. It overrides working directory methods, file discovery services, and the tool registry (core tools are bound to the agent config), while optionally specifying per-agent model configurations. All unoverridden methods transparently resolve up to the parent config.Discovered tools (MCP/command) are partially isolated—the same tool instances are shared between parent and per-agent registries, as they execute against the parent's context. This is intentional; discovered tools are project-level commands that run at the project root.
Sideband Communication
In split-pane mode, each agent runs in a separate process, necessitating lightweight IPC. Arena uses the file system as its communication medium in a star topology: the main process (
ArenaManager) communicates bidirectionally with child agents, but child agents do not communicate with one another.State Reporting (Child → Main): When a child agent detects
ARENA_AGENT_IDandARENA_SESSION_DIRenvironment variables, it activatesArenaAgentClient. The client atomically writes to state files (status,rounds,tokenstatistics,currentActivity,finalSummary) at key lifecycle points. The main process polls these files at 500ms intervals, updating internal state and emitting UI events.Control Signals (Main → Child): The main process writes control files (
shutdownorcancel) viasendControlSignal(). The child agent checks these files before every reasoning round and processes them in a consume-once pattern. If a child proxy fails to respond within a grace period, it is forcibly terminated by the Backend'sstopAgent()method.In-Process Agent Rendering
Arena agents in in-process mode use an adapter (
agentMessagesToHistoryItems) that convertsAgentMessage[]intoHistoryItem[]and feeds them intoHistoryItemDisplay—the same renderer used by the main agent. This eliminates a parallel rendering branch and ensures arena agents automatically inherit every future display type.Agent views do not stream model text live; model output appears only after each round completes (via committed messages from
ROUND_TEXT). This keeps the rendering simple and avoids per-chunk re-renders.AgentChatViewsplitsHistoryItem[]into two areas: a Static area (<Static>) for committed items (efficient Ink rendering, never re-rendered), and a Live area for the lasttool_groupwith anExecutingorConfirmingtool, plus everything after it (stays outside<Static>so confirmation dialogs remain interactive).User Features
Entry Point
/arena start "task description"— Opens interactive MultiSelect dialog listing all configured models. Requires at least 2 models. (qwen-oauthmodels are excluded, they are not applicable in agent arena.)Display Modes and Interaction
Arena provides two distinct operational modes with fundamentally different interaction models:
Split-Pane Mode (tmux backend)
Used when tmux is available (running inside tmux or tmux installed). Each agent runs as a standalone CLI process in separate terminal panes.
Behavior:
tmux -L qwen-code-arena-<id> attach) to view all agents side-by-sideConfiguration: Set
"arena": { "display": "tmux" }in settings to force tmux mode (fails if tmux unavailable).In-Process Mode
Silent fallback when tmux is unavailable. All agents run asynchronously within the current Node process.
Behavior:
Main | Agent1 ● | Agent2 ✓ | Agent3 ✗ ←/→●running,✓completed,✗failed,○cancelledConfiguration: Set
"arena": { "display": "in-process" }to prefer in-process even when tmux available.Auto-Detection: Default behavior uses tmux when available, silently falls back to in-process with a warning message. This ensures Arena works out-of-the-box regardless of environment.
Monitoring and Control
/arena status— Opens live status dialog showing:/arena stop— Gracefully cancels all running agents. Completed agents remain available for selection.Result Selection
Upon completion (or via
/arena select):Reviewer Test Plan
/arena start "write fibonacci in TypeScript"— select 2+ models from dialog, verify parallel execution, progress messages, result summary card/arena select, verify diff stats, select winner, confirm changes applied to main workspace~/.qwen/arena/<session>/removed and no orphaned tmux sessionsTested on macOS with both tmux and in-process modes.
Linked Issues
🤖 Generated with Qwen Code