Skip to content

fix(agent): dispatch per-candidate provider in fallback chain#1637

Closed
securityguy wants to merge 1 commit intosipeed:mainfrom
securityguy:fix/provider-dispatch
Closed

fix(agent): dispatch per-candidate provider in fallback chain#1637
securityguy wants to merge 1 commit intosipeed:mainfrom
securityguy:fix/provider-dispatch

Conversation

@securityguy
Copy link
Copy Markdown
Contributor

Summary

  • All agents previously shared a single LLMProvider instance created from agents.defaults.model_name
  • Per-agent model config only changed the model string passed to Chat() — it never changed which provider was invoked
  • Cross-provider fallback chains (e.g. gemini-cli falling back to claude-cli) failed silently, and different CLI providers could not be assigned to different agents

Solution

Introduces ProviderDispatcher which lazily creates and caches provider instances keyed by "protocol/modelID":

  • Thread-safe (sync.RWMutex), double-checked locking pattern
  • Providers are stateless so caching is safe across agents with no data leakage
  • Flush(cfg) method clears the cache on config reload
  • The fallback chain's run closure resolves the correct provider via the dispatcher, falling back to agent.Provider for backward compatibility

References #1634 (keeping issue open for tracking)

Test plan

  • go test ./pkg/providers/... -run TestProviderDispatcher -race — 5 tests, all pass including race detector
  • go test ./pkg/agent/... — existing agent loop tests pass with updated NewAgentLoop signature
  • go build ./... — compiles cleanly

🤖 Generated with Claude Code

Previously all agents shared a single LLMProvider instance created from
agents.defaults.model_name. Per-agent model config (agents.list[].model)
only changed the model string passed to Chat() — it never changed which
provider binary was invoked. This caused cross-provider fallback chains
(e.g. gemini-cli falling back to claude-cli) to fail, and made it
impossible to assign different CLI providers to different agents.

Introduces ProviderDispatcher which lazily creates and caches provider
instances keyed by "protocol/modelID". The fallback chain's run closure
now resolves the correct provider via the dispatcher before falling back
to agent.Provider for backward compatibility.

References sipeed#1634

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
securityguy added a commit to securityguy/picoclaw that referenced this pull request Mar 16, 2026
Brings in ProviderDispatcher fix (PR sipeed#1637 against sipeed/picoclaw).
References sipeed#1634.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
securityguy added a commit to securityguy/picoclaw that referenced this pull request Mar 16, 2026
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough implementation — the tests and thread-safety considerations are appreciated. However, after comparing this with #2143, I believe #2143 is the better fit for this project. Here's my reasoning:

Why #2143 over this PR:

  1. YAGNI#2143 solves the same bug in 43 lines vs 286 lines. The ProviderDispatcher with double-check locking, lazy creation, and Flush() is over-engineered for a map that's written once at init and read-only thereafter. No locks are needed.

  2. Correct lifecycleCandidateProviders lives on AgentInstance (per-agent config), while ProviderDispatcher lives on AgentLoop (shared across agents). Provider configuration follows the agent, not the loop. Multiple agents with different fallback chains need independent provider maps.

  3. Lower API surface impact#2143 doesn't change NewAgentLoop's signature or touch 6 call sites. This PR changes the public API and requires updating every caller + test.

  4. No global stateProviderDispatcher is a global cache that needs explicit Flush() on config reload. #2143's map is created during NewAgentInstance and naturally discarded when the agent is recreated on reload.

What I'd like to salvage from this PR if possible:

  • The unit tests are well-structured — it would be great if @corevibe555 could add similar tests to #2143
  • The Flush() concept is worth documenting as a pattern for when lazy initialization is actually needed in the future

I'm going to close this PR in favor of #2143. Thank you for the contribution and the careful implementation — the quality of work here is clearly strong.

@yinwm yinwm closed this Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants