Skip to content

fix(proxy): prioritize reasoning health-check max token precedence#25936

Merged
ishaan-berri merged 2 commits intolitellm_internal_stagingfrom
litellm_health-check-reasoning-tokens
Apr 18, 2026
Merged

fix(proxy): prioritize reasoning health-check max token precedence#25936
ishaan-berri merged 2 commits intolitellm_internal_stagingfrom
litellm_health-check-reasoning-tokens

Conversation

@Sameerlite
Copy link
Copy Markdown
Collaborator

@Sameerlite Sameerlite commented Apr 17, 2026

Summary

  • add reasoning-aware health-check max token resolution in proxy health checks
  • prioritize BACKGROUND_HEALTH_CHECK_MAX_TOKENS_REASONING for reasoning models, then fall back to BACKGROUND_HEALTH_CHECK_MAX_TOKENS
  • raise non-wildcard default health-check max tokens from 1 to 5 and update docs/tests

Test plan

  • poetry run pytest tests/test_litellm/proxy/test_health_check_max_tokens.py -v
  • verified no lints on changed files
BACKGROUND_HEALTH_CHECK_MAX_TOKENS_REASONING=20
image image

Apply reasoning-first precedence for background health-check max tokens, parse reasoning env as optional, and raise non-wildcard fallback max_tokens from 1 to 5 for better reliability.

Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 17, 2026 7:21am

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 17, 2026

Greptile Summary

This PR introduces reasoning-aware max-token selection during proxy background health checks, adding a dedicated setting for reasoning models and raising the non-wildcard hardcoded default from 1 to 5. The logic is extracted into _resolve_health_check_max_tokens with a well-documented priority chain (explicit model_info key → reasoning/non-reasoning split → global override → hardcoded default), backed by thorough unit tests and updated docs.

One behavioral shift: wildcard detection now checks only litellm_params[\"model\"] rather than model_info[\"health_check_model\"] first. Deployments whose health-check model override was a wildcard but whose deployment model string was concrete previously received no max_tokens; they will now receive the default of 5. This is intentional and documented.

Confidence Score: 5/5

Safe to merge; all findings are P2 or informational, and the implementation is correct with thorough test coverage.

No P0 or P1 issues found. The priority resolution logic is correct and matches the documented contract. All seven new tests are properly isolated with mocks. The one behavioral shift around wildcard detection is explicitly documented. The default bump from 1 to 5 is a mild backward-compatibility concern but users can restore old behavior via the global env var, and it is intentional.

No files require special attention.

Important Files Changed

Filename Overview
litellm/proxy/health_check.py Core change: extracts max_tokens resolution into _resolve_health_check_max_tokens with clean priority chain; logic is correct and wildcard detection now uses only litellm_params["model"] (intentional, documented).
litellm/constants.py Adds BACKGROUND_HEALTH_CHECK_MAX_TOKENS_REASONING constant following the identical parsing pattern as the existing BACKGROUND_HEALTH_CHECK_MAX_TOKENS — correct and safe.
tests/test_litellm/proxy/test_health_check_max_tokens.py Seven new unit tests cover all priority branches (model_info keys, env-var precedence, wildcard exclusion, fallthrough); all properly mock supports_reasoning and the module-level constants — no real network calls.
docs/my-website/docs/proxy/health.md Docs updated with new default (5), reasoning/non-reasoning model_info keys, and env var precedence rules — accurate and matches implementation.
docs/my-website/docs/proxy/config_settings.md Config settings table updated to reflect new default (5) and new reasoning env var — accurate.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_resolve_health_check_max_tokens] --> B{model_info.health_check_max_tokens set?}
    B -- Yes --> C[return explicit value]
    B -- No --> D{litellm_params.model contains '*'?}
    D -- Yes, wildcard --> E{BACKGROUND_HEALTH_CHECK_MAX_TOKENS set?}
    E -- Yes --> F[return global env value]
    E -- No --> G[return None — omit max_tokens]
    D -- No, non-wildcard --> H[supports_reasoning check]
    H --> I{model_info reasoning/non-reasoning keys set?}
    I -- Yes --> J{is_reasoning?}
    J -- Yes + tokens_reasoning set --> K[return tokens_reasoning]
    J -- No + tokens_non_reasoning set --> L[return tokens_non_reasoning]
    J -- branch mismatch --> M{is_reasoning + REASONING_ENV set?}
    I -- No --> M
    M -- Yes --> N[return BACKGROUND_HEALTH_CHECK_MAX_TOKENS_REASONING]
    M -- No --> O{BACKGROUND_HEALTH_CHECK_MAX_TOKENS set?}
    O -- Yes --> P[return global env value]
    O -- No --> Q[return 5 default]
Loading

Reviews (2): Last reviewed commit: "fix(proxy): avoid duplicate reasoning ca..." | Re-trigger Greptile

Comment thread litellm/proxy/health_check.py
Comment thread tests/test_litellm/proxy/test_health_check_max_tokens.py
Compute supports_reasoning once per non-wildcard health-check resolution path and update the stale default-max-tokens test docstring.

Made-with: Cursor
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 17, 2026

Codecov Report

❌ Patch coverage is 72.09302% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/proxy/health_check.py 72.09% 12 Missing ⚠️

📢 Thoughts on this report? Let us know!

@ishaan-berri ishaan-berri merged commit d03c301 into litellm_internal_staging Apr 18, 2026
98 of 101 checks passed
@ishaan-berri ishaan-berri deleted the litellm_health-check-reasoning-tokens branch April 18, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants