fix(proxy): prioritize reasoning health-check max token precedence#25936
Conversation
Apply reasoning-first precedence for background health-check max tokens, parse reasoning env as optional, and raise non-wildcard fallback max_tokens from 1 to 5 for better reliability. Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR introduces reasoning-aware max-token selection during proxy background health checks, adding a dedicated setting for reasoning models and raising the non-wildcard hardcoded default from 1 to 5. The logic is extracted into One behavioral shift: wildcard detection now checks only Confidence Score: 5/5Safe to merge; all findings are P2 or informational, and the implementation is correct with thorough test coverage. No P0 or P1 issues found. The priority resolution logic is correct and matches the documented contract. All seven new tests are properly isolated with mocks. The one behavioral shift around wildcard detection is explicitly documented. The default bump from 1 to 5 is a mild backward-compatibility concern but users can restore old behavior via the global env var, and it is intentional. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/proxy/health_check.py | Core change: extracts max_tokens resolution into _resolve_health_check_max_tokens with clean priority chain; logic is correct and wildcard detection now uses only litellm_params["model"] (intentional, documented). |
| litellm/constants.py | Adds BACKGROUND_HEALTH_CHECK_MAX_TOKENS_REASONING constant following the identical parsing pattern as the existing BACKGROUND_HEALTH_CHECK_MAX_TOKENS — correct and safe. |
| tests/test_litellm/proxy/test_health_check_max_tokens.py | Seven new unit tests cover all priority branches (model_info keys, env-var precedence, wildcard exclusion, fallthrough); all properly mock supports_reasoning and the module-level constants — no real network calls. |
| docs/my-website/docs/proxy/health.md | Docs updated with new default (5), reasoning/non-reasoning model_info keys, and env var precedence rules — accurate and matches implementation. |
| docs/my-website/docs/proxy/config_settings.md | Config settings table updated to reflect new default (5) and new reasoning env var — accurate. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[_resolve_health_check_max_tokens] --> B{model_info.health_check_max_tokens set?}
B -- Yes --> C[return explicit value]
B -- No --> D{litellm_params.model contains '*'?}
D -- Yes, wildcard --> E{BACKGROUND_HEALTH_CHECK_MAX_TOKENS set?}
E -- Yes --> F[return global env value]
E -- No --> G[return None — omit max_tokens]
D -- No, non-wildcard --> H[supports_reasoning check]
H --> I{model_info reasoning/non-reasoning keys set?}
I -- Yes --> J{is_reasoning?}
J -- Yes + tokens_reasoning set --> K[return tokens_reasoning]
J -- No + tokens_non_reasoning set --> L[return tokens_non_reasoning]
J -- branch mismatch --> M{is_reasoning + REASONING_ENV set?}
I -- No --> M
M -- Yes --> N[return BACKGROUND_HEALTH_CHECK_MAX_TOKENS_REASONING]
M -- No --> O{BACKGROUND_HEALTH_CHECK_MAX_TOKENS set?}
O -- Yes --> P[return global env value]
O -- No --> Q[return 5 default]
Reviews (2): Last reviewed commit: "fix(proxy): avoid duplicate reasoning ca..." | Re-trigger Greptile
Compute supports_reasoning once per non-wildcard health-check resolution path and update the stale default-max-tokens test docstring. Made-with: Cursor
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
d03c301
into
litellm_internal_staging
Summary
BACKGROUND_HEALTH_CHECK_MAX_TOKENS_REASONINGfor reasoning models, then fall back toBACKGROUND_HEALTH_CHECK_MAX_TOKENSTest plan
poetry run pytest tests/test_litellm/proxy/test_health_check_max_tokens.py -v