feat(runtime): add explicit LLM fallback chain across providers/models#3354
Open
hussein1362 wants to merge 2 commits intoHKUDS:mainfrom
Open
feat(runtime): add explicit LLM fallback chain across providers/models#3354hussein1362 wants to merge 2 commits intoHKUDS:mainfrom
hussein1362 wants to merge 2 commits intoHKUDS:mainfrom
Conversation
…get_api_base Config.get_api_base() re-resolves the provider from the model string via _match_provider(), which only reads agents.defaults.provider. For fallback targets that override the provider, this would return the wrong provider config. _resolve_api_base() operates on the already-resolved objects from _resolve_provider() to avoid this.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
nanobot can be configured with multiple LLM providers, but at runtime it only uses one provider/model pair.
Today, if the active provider/model hits a transient upstream failure, nanobot only retries that same provider via
provider_retry_mode. Once those retries are exhausted, the turn fails even if another configured provider is healthy.That leaves a real gap between configuration and runtime behavior:
Streaming has one extra constraint: once text has already started streaming, we cannot safely fail over to another model without duplicating or corrupting user-visible output.
Root Cause
Provider selection currently happens once inside
_make_provider(), which constructs a single provider instance and returns it.After that:
provider_retry_moderetries only the same providerSolution
This PR adds explicit runtime fallback chains across providers/models.
Config
Adds
agents.defaults.fallbacks, an ordered list of fallback targets:{ "agents": { "defaults": { "model": "gpt-5.4", "provider": "openai", "fallbacks": [ {"model": "claude-sonnet-4-6", "provider": "anthropic"}, {"model": "gpt-4.1-mini"} ] } } }Each fallback entry supports:
model(required)provider(optional, defaults toauto)Runtime
Extract provider construction into a shared factory used by both runtime and CLI.
Add
FallbackProvider, which wraps the primary provider plus ordered fallback candidates and applies this policy:Implementation details:
Tests
Added coverage for:
_make_provider()behavior still working through the shared factory pathCommands run:
uv run --python 3.12 pytest -q tests/providers/test_provider_retry.py tests/cli/test_commands.py -k 'fallback or make_provider' uv run --python 3.12 ruff check nanobot/config/schema.py nanobot/providers/fallback_provider.py nanobot/providers/factory.py tests/providers/test_provider_retry.py