fix(proxy): prioritize reasoning health-check max token precedence by Sameerlite · Pull Request #25936 · BerriAI/litellm

Sameerlite · 2026-04-17T07:07:21Z

Summary

add reasoning-aware health-check max token resolution in proxy health checks
prioritize BACKGROUND_HEALTH_CHECK_MAX_TOKENS_REASONING for reasoning models, then fall back to BACKGROUND_HEALTH_CHECK_MAX_TOKENS
raise non-wildcard default health-check max tokens from 1 to 5 and update docs/tests

Test plan

poetry run pytest tests/test_litellm/proxy/test_health_check_max_tokens.py -v
verified no lints on changed files

BACKGROUND_HEALTH_CHECK_MAX_TOKENS_REASONING=20

Apply reasoning-first precedence for background health-check max tokens, parse reasoning env as optional, and raise non-wildcard fallback max_tokens from 1 to 5 for better reliability. Made-with: Cursor

vercel · 2026-04-17T07:07:23Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Apr 17, 2026 7:21am

greptile-apps · 2026-04-17T07:15:16Z

Greptile Summary

This PR introduces reasoning-aware max-token selection during proxy background health checks, adding a dedicated setting for reasoning models and raising the non-wildcard hardcoded default from 1 to 5. The logic is extracted into _resolve_health_check_max_tokens with a well-documented priority chain (explicit model_info key → reasoning/non-reasoning split → global override → hardcoded default), backed by thorough unit tests and updated docs.

One behavioral shift: wildcard detection now checks only litellm_params[\"model\"] rather than model_info[\"health_check_model\"] first. Deployments whose health-check model override was a wildcard but whose deployment model string was concrete previously received no max_tokens; they will now receive the default of 5. This is intentional and documented.

Confidence Score: 5/5

Safe to merge; all findings are P2 or informational, and the implementation is correct with thorough test coverage.

No P0 or P1 issues found. The priority resolution logic is correct and matches the documented contract. All seven new tests are properly isolated with mocks. The one behavioral shift around wildcard detection is explicitly documented. The default bump from 1 to 5 is a mild backward-compatibility concern but users can restore old behavior via the global env var, and it is intentional.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/proxy/health_check.py	Core change: extracts max_tokens resolution into `_resolve_health_check_max_tokens` with clean priority chain; logic is correct and wildcard detection now uses only `litellm_params["model"]` (intentional, documented).
litellm/constants.py	Adds `BACKGROUND_HEALTH_CHECK_MAX_TOKENS_REASONING` constant following the identical parsing pattern as the existing `BACKGROUND_HEALTH_CHECK_MAX_TOKENS` — correct and safe.
tests/test_litellm/proxy/test_health_check_max_tokens.py	Seven new unit tests cover all priority branches (model_info keys, env-var precedence, wildcard exclusion, fallthrough); all properly mock `supports_reasoning` and the module-level constants — no real network calls.
docs/my-website/docs/proxy/health.md	Docs updated with new default (5), reasoning/non-reasoning model_info keys, and env var precedence rules — accurate and matches implementation.
docs/my-website/docs/proxy/config_settings.md	Config settings table updated to reflect new default (5) and new reasoning env var — accurate.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_resolve_health_check_max_tokens] --> B{model_info.health_check_max_tokens set?}
    B -- Yes --> C[return explicit value]
    B -- No --> D{litellm_params.model contains '*'?}
    D -- Yes, wildcard --> E{BACKGROUND_HEALTH_CHECK_MAX_TOKENS set?}
    E -- Yes --> F[return global env value]
    E -- No --> G[return None — omit max_tokens]
    D -- No, non-wildcard --> H[supports_reasoning check]
    H --> I{model_info reasoning/non-reasoning keys set?}
    I -- Yes --> J{is_reasoning?}
    J -- Yes + tokens_reasoning set --> K[return tokens_reasoning]
    J -- No + tokens_non_reasoning set --> L[return tokens_non_reasoning]
    J -- branch mismatch --> M{is_reasoning + REASONING_ENV set?}
    I -- No --> M
    M -- Yes --> N[return BACKGROUND_HEALTH_CHECK_MAX_TOKENS_REASONING]
    M -- No --> O{BACKGROUND_HEALTH_CHECK_MAX_TOKENS set?}
    O -- Yes --> P[return global env value]
    O -- No --> Q[return 5 default]

_{Reviews (2): Last reviewed commit: "fix(proxy): avoid duplicate reasoning ca..." | Re-trigger Greptile}

Compute supports_reasoning once per non-wildcard health-check resolution path and update the stale default-max-tokens test docstring. Made-with: Cursor

codecov · 2026-04-17T07:41:07Z

Codecov Report

❌ Patch coverage is 72.09302% with 12 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/proxy/health_check.py	72.09%	12 Missing ⚠️

📢 Thoughts on this report? Let us know!

fix(proxy): prioritize reasoning health check token defaults

d86c6a5

Apply reasoning-first precedence for background health-check max tokens, parse reasoning env as optional, and raise non-wildcard fallback max_tokens from 1 to 5 for better reliability. Made-with: Cursor

Sameerlite temporarily deployed to integration-postgres April 17, 2026 07:07 — with GitHub Actions Inactive

Sameerlite had a problem deploying to integration-postgres April 17, 2026 07:07 — with GitHub Actions Error

Sameerlite temporarily deployed to integration-postgres April 17, 2026 07:07 — with GitHub Actions Inactive

vercel Bot deployed to Preview April 17, 2026 07:09 View deployment

greptile-apps Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread litellm/proxy/health_check.py

Comment thread tests/test_litellm/proxy/test_health_check_max_tokens.py

fix(proxy): avoid duplicate reasoning capability lookup

76c69bb

Compute supports_reasoning once per non-wildcard health-check resolution path and update the stale default-max-tokens test docstring. Made-with: Cursor

Sameerlite temporarily deployed to integration-postgres April 17, 2026 07:19 — with GitHub Actions Inactive

Sameerlite temporarily deployed to integration-postgres April 17, 2026 07:20 — with GitHub Actions Inactive

Sameerlite had a problem deploying to integration-postgres April 17, 2026 07:20 — with GitHub Actions Error

vercel Bot deployed to Preview April 17, 2026 07:21 View deployment

ishaan-berri approved these changes Apr 18, 2026

View reviewed changes

ishaan-berri merged commit d03c301 into litellm_internal_staging Apr 18, 2026
98 of 101 checks passed

ishaan-berri deleted the litellm_health-check-reasoning-tokens branch April 18, 2026 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(proxy): prioritize reasoning health-check max token precedence#25936

fix(proxy): prioritize reasoning health-check max token precedence#25936
ishaan-berri merged 2 commits intolitellm_internal_stagingfrom
litellm_health-check-reasoning-tokens

Sameerlite commented Apr 17, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 17, 2026 •

edited

Loading

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Sameerlite commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

vercel Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 17, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sameerlite commented Apr 17, 2026 •

edited

Loading

vercel Bot commented Apr 17, 2026 •

edited

Loading

greptile-apps Bot commented Apr 17, 2026 •

edited

Loading