fix(agent): dispatch per-candidate provider in fallback chain#1637
fix(agent): dispatch per-candidate provider in fallback chain#1637securityguy wants to merge 1 commit intosipeed:mainfrom
Conversation
Previously all agents shared a single LLMProvider instance created from agents.defaults.model_name. Per-agent model config (agents.list[].model) only changed the model string passed to Chat() — it never changed which provider binary was invoked. This caused cross-provider fallback chains (e.g. gemini-cli falling back to claude-cli) to fail, and made it impossible to assign different CLI providers to different agents. Introduces ProviderDispatcher which lazily creates and caches provider instances keyed by "protocol/modelID". The fallback chain's run closure now resolves the correct provider via the dispatcher before falling back to agent.Provider for backward compatibility. References sipeed#1634 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Brings in ProviderDispatcher fix (PR sipeed#1637 against sipeed/picoclaw). References sipeed#1634. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yinwm
left a comment
There was a problem hiding this comment.
Thanks for the thorough implementation — the tests and thread-safety considerations are appreciated. However, after comparing this with #2143, I believe #2143 is the better fit for this project. Here's my reasoning:
Why #2143 over this PR:
-
YAGNI — #2143 solves the same bug in 43 lines vs 286 lines. The
ProviderDispatcherwith double-check locking, lazy creation, andFlush()is over-engineered for a map that's written once at init and read-only thereafter. No locks are needed. -
Correct lifecycle —
CandidateProviderslives onAgentInstance(per-agent config), whileProviderDispatcherlives onAgentLoop(shared across agents). Provider configuration follows the agent, not the loop. Multiple agents with different fallback chains need independent provider maps. -
Lower API surface impact — #2143 doesn't change
NewAgentLoop's signature or touch 6 call sites. This PR changes the public API and requires updating every caller + test. -
No global state —
ProviderDispatcheris a global cache that needs explicitFlush()on config reload. #2143's map is created duringNewAgentInstanceand naturally discarded when the agent is recreated on reload.
What I'd like to salvage from this PR if possible:
- The unit tests are well-structured — it would be great if @corevibe555 could add similar tests to #2143
- The
Flush()concept is worth documenting as a pattern for when lazy initialization is actually needed in the future
I'm going to close this PR in favor of #2143. Thank you for the contribution and the careful implementation — the quality of work here is clearly strong.
Summary
LLMProviderinstance created fromagents.defaults.model_namemodelconfig only changed the model string passed toChat()— it never changed which provider was invokedgemini-clifalling back toclaude-cli) failed silently, and different CLI providers could not be assigned to different agentsSolution
Introduces
ProviderDispatcherwhich lazily creates and caches provider instances keyed by"protocol/modelID":Flush(cfg)method clears the cache on config reloadrunclosure resolves the correct provider via the dispatcher, falling back toagent.Providerfor backward compatibilityReferences #1634 (keeping issue open for tracking)
Test plan
go test ./pkg/providers/... -run TestProviderDispatcher -race— 5 tests, all pass including race detectorgo test ./pkg/agent/...— existing agent loop tests pass with updatedNewAgentLoopsignaturego build ./...— compiles cleanly🤖 Generated with Claude Code