Skip to content

feat: Multi-provider LLM support (failover, auto-discovery, Ollama, Bedrock, Gemini) #80

@ilblackdragon

Description

@ilblackdragon

Feature Parity: Multi-Provider LLM Support

Priority: P1-P3 (varies by provider)
Source: FEATURE_PARITY.md — Model & Provider Support

Missing Providers

  • Ollama / local models (P2) — Local inference
  • AWS Bedrock (P3)
  • Google Gemini (P3)
  • OpenRouter (P3)
  • llama.cpp native (P3) — Via Rust bindings

Missing Model Features

  • Multi-provider failover (P1) — Provider fallback chains with retry/backoff
  • Auto-discovery — Detect available models per provider
  • Cooldown management — Skip failed/rate-limited providers temporarily

Related PRs

Related Issues


Design Considerations

Current Architecture

The LlmProvider trait (src/llm/provider.rs:253-321) defines 11 methods including complete(), complete_with_tools(), list_models(), model_metadata(), set_model(), seed_response_chain(), and calculate_cost(). A factory function create_llm_provider() in src/llm/mod.rs instantiates the selected backend via LlmBackend enum.

Five backends already exist:

Backend Adapter Auth
NearAi Native (Responses API + Chat Completions) Session token or API key
OpenAi rig-core RigAdapter<openai::Client> API key
Anthropic rig-core RigAdapter<anthropic::Client> API key
Ollama rig-core RigAdapter<ollama::Client> None
OpenAiCompatible rig-core with custom base URL Optional API key

Key patterns to preserve:

  • metadata: HashMap<String, String> forwarding (used by NEAR AI for response chaining via thread_id)
  • cost_per_token() for estimation module budget tracking
  • response_id for provider-specific optimizations
  • FinishReason enum for flow control (Stop, Length, ToolUse, ContentFilter)

Failover Architecture

Recommended: FailoverProvider wrapper implementing LlmProvider

pub struct FailoverProvider {
    providers: Vec<(String, Arc<dyn LlmProvider>)>, // (name, provider)
    circuit_breakers: HashMap<String, CircuitBreaker>,
    strategy: FailoverStrategy, // Priority | RoundRobin | CostOptimal
}

struct CircuitBreaker {
    failures: AtomicU32,
    last_failure: AtomicU64,
    cooldown: Duration,      // Skip provider for N seconds after failure
    threshold: u32,          // Failures before tripping
}

Config:

LLM_PROVIDERS=openai,anthropic,nearai      # Priority order
LLM_FAILOVER_COOLDOWN_SECS=300             # Per-provider cooldown
LLM_FAILOVER_THRESHOLD=3                   # Failures before cooldown

Error classification matters: 429 (rate limit) → cooldown with retry-after. 401 (auth) → permanent skip. 500+ (server error) → circuit breaker. Timeout → immediate failover.

Auto-Discovery

list_models() trait method already exists with a default empty implementation. Each backend can override:

  • Ollama: GET /api/tags → list of locally available models
  • OpenAI/Compatible: GET /v1/models → API-available models
  • NEAR AI: custom endpoint for model catalog

Runtime model switching via set_model() is supported by NEAR AI (stores in RwLock<String>). Rig-based adapters would need wrapper logic to rebuild the client on model change.

New Backend Patterns

Each new backend follows the same pattern:

  1. Add LlmBackend::Gemini enum variant to config.rs
  2. Create src/llm/gemini.rs implementing LlmProvider
  3. Add factory function in src/llm/mod.rs
  4. Parse config from env vars

Bedrock-specific concerns:

  • AWS credential chain (env vars → shared config → IAM role)
  • SigV4 request signing (use aws-sdk-bedrockruntime crate)
  • Regional endpoint selection
  • No list_models() without IAM permissions for bedrock:ListFoundationModels

OpenRouter:

  • Essentially OpenAI-compatible with model routing
  • Could reuse OpenAiCompatible backend with base_url=https://openrouter.ai/api/v1
  • Cost tracking via OpenRouter's x-openrouter-cost response header

Impact on Reasoning Module

src/llm/reasoning.rs calls respond_with_tools() with metadata forwarding. Failover must:

  • Clear provider-specific metadata (e.g., NEAR AI thread_id) when switching providers mid-conversation
  • Reset response chaining state on failover
  • Log provider switches for debugging

Success Criteria

  1. Failover: Agent continues functioning when primary provider returns 429/500/timeout, automatically retrying on next provider in chain within 5 seconds
  2. Circuit breaker: Provider removed from rotation after N consecutive failures, re-checked after cooldown period
  3. Auto-discovery: ironclaw config list-models shows available models across all configured providers
  4. Cost tracking: Each provider reports accurate cost_per_token() for estimation module
  5. Zero regression: Existing NEAR AI response chaining and session management unaffected when used as single provider
  6. Config simplicity: Single env var LLM_PROVIDERS=openai,ollama enables multi-provider with sensible defaults
  7. Observability: Provider switches logged with reason (rate_limit, timeout, error) for debugging

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions