Reduce default latency histogram bucket cardinality by J-Byron · Pull Request #25527 · BerriAI/litellm

J-Byron · 2026-04-10T20:17:06Z

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

🆕 New Feature

Changes

Default Prometheus latency histogram had 35 buckets with dense 0.5s granularity between 1–10s. With multiple label dimensions (model, api_key, team, etc.), this creates excessive time series cardinality.

Reduced default buckets from 35 to 18 by removing the half-second intervals in the middle, added 420s and 600s boundaries to cover typical LLM timeout ranges, and made the bucket list configurable via prometheus_latency_buckets in litellm_settings so users can tune for their own cardinality needs.

Breaking Change

The default LATENCY_BUCKETS has been reduced from 35 to 18 boundaries. If you have existing Prometheus dashboards or PromQL SLO queries that reference specific le values (e.g. le="1.5", le="9.5"), those series will no longer exist after upgrading.

To restore the previous buckets, add this to your config.yaml:

litellm_settings:
  prometheus_latency_buckets: [0.005, 0.00625, 0.0125, 0.025, 0.05, 0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 15.0, 20.0, 25.0, 30.0, 60.0, 120.0, 180.0, 240.0, 300.0]

…configurable

vercel · 2026-04-10T20:17:11Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Apr 10, 2026 8:32pm

J-Byron · 2026-04-10T20:17:20Z

@greptileai

codspeed-hq · 2026-04-10T20:18:43Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing J-Byron:feat/prometheus-latency-buckets (b9657b7) with main (9e4352a)}

codecov · 2026-04-10T20:19:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-04-10T20:21:31Z

Greptile Summary

This PR reduces the default Prometheus latency histogram from 35 to 18 buckets (removing dense half-second intervals from 1–10 s, adding 420 s and 600 s boundaries) and exposes a prometheus_latency_buckets key in litellm_settings to let users override the defaults. The changes are applied consistently across both PrometheusLogger and PrometheusServicesLogger, and two new test suites cover the default and custom-bucket paths.

Confidence Score: 4/5

Mostly safe to merge, but the breaking default change and one test isolation issue warrant attention before merging.

The implementation is clean and consistent across both logger classes. However, three concerns flagged in previous review threads remain unaddressed: the default bucket list is a breaking change for users with existing Prometheus dashboards (violating the codebase backwards-compat rule), user-provided buckets are accepted without validation, and both custom-bucket tests rely on a private registry API. A new finding — missing registry cleanup in test_services_logger_default_latency_buckets — introduces a P1 test-ordering fragility. Together these keep the score at 4/5.

tests/test_litellm/integrations/test_prometheus_services.py (missing registry isolation), litellm/init.py (no bucket list validation), litellm/types/integrations/prometheus.py (breaking default change)

Important Files Changed

Filename	Overview
litellm/types/integrations/prometheus.py	Reduces LATENCY_BUCKETS from 35 to 18 boundaries and adds 420 s / 600 s — a breaking default change for existing Prometheus dashboards and SLO queries.
litellm/init.py	Adds `prometheus_latency_buckets: Optional[List[float]] = None` global; no input validation at the declaration site (consumed verbatim by loggers).
litellm/integrations/prometheus.py	Reads `litellm.prometheus_latency_buckets` at init time and wires it through to all six histogram metrics; clean and consistent.
litellm/integrations/prometheus_services.py	Adds `import litellm` at module level and reads `prometheus_latency_buckets` at init time; consistent with prometheus.py treatment.
tests/test_litellm/integrations/test_prometheus_services.py	New `test_services_logger_default_latency_buckets` creates a real PrometheusServicesLogger without registry cleanup, making it order-dependent on other tests; the custom-bucket companion test manages cleanup but not the default test.
tests/test_litellm/integrations/test_prometheus_user_team_metrics.py	New `test_default_latency_buckets` correctly uses the `prometheus_logger` fixture; `test_custom_latency_buckets` relies on private `REGISTRY._collector_to_names` for isolation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[YAML config loaded\nlitellm_settings] --> B{prometheus_latency_buckets\nset?}
    B -- Yes --> C[setattr litellm.prometheus_latency_buckets\nvia generic fallthrough]
    B -- No --> D[litellm.prometheus_latency_buckets = None]
    C --> E[PrometheusLogger.__init__]
    D --> E
    E --> F{_custom_buckets is not None?}
    F -- Yes --> G[self.latency_buckets = tuple custom_buckets]
    F -- No --> H[self.latency_buckets = LATENCY_BUCKETS\n18 boundaries incl. 420s & 600s]
    G --> I[All 6 histogram metrics\nuse self.latency_buckets]
    H --> I
    J[PrometheusServicesLogger.__init__] --> K{_custom_buckets is not None?}
    K -- Yes --> L[self.latency_buckets = tuple custom_buckets]
    K -- No --> M[self.latency_buckets = LATENCY_BUCKETS]
    L --> N[Service latency histograms\nuse self.latency_buckets]
    M --> N

_{Reviews (2): Last reviewed commit: "test(prometheus): add coverage for Prome..." | Re-trigger Greptile}

greptile-apps · 2026-04-10T20:21:35Z

 LATENCY_BUCKETS = (
    0.005,
-    0.00625,
-    0.0125,
+    0.01,
    0.025,
    0.05,
    0.1,
+    0.25,
    0.5,
    1.0,
-    1.5,
    2.0,
-    2.5,
-    3.0,
-    3.5,
-    4.0,
-    4.5,
    5.0,
-    5.5,
-    6.0,
-    6.5,
-    7.0,
-    7.5,
-    8.0,
-    8.5,
-    9.0,
-    9.5,
    10.0,
-    15.0,
-    20.0,
-    25.0,
    30.0,
    60.0,
    120.0,
-    180.0,
-    240.0,
    300.0,
+    420.0,  # 7 minutes
+    600.0,  # 10 minutes (typical default LLM request timeout)
    float("inf"),
 )


Breaking default change removes existing le label values

Prometheus Histogram metrics persist their _bucket time-series keyed by le label. Any dashboard or PromQL SLO expression that filters on a removed boundary — e.g. {le="1.5"}, {le="2.5"}, or any of the nine other dropped values — will return no data after this upgrade with no warning. histogram_quantile over a recording rule or a Grafana panel will also produce a jump in apparent latency distribution at the moment of rollout.

The prometheus_latency_buckets escape hatch requires users to know the change happened and reconstruct the old list manually. Per the codebase's backward-compat policy, a breaking default should either be opt-in (keep old defaults, document the new reduced set as the recommended override) or at minimum include a migration note in the changelog. Consider keeping the old LATENCY_BUCKETS as LATENCY_BUCKETS_LEGACY and exposing the new set as the recommended value, so users who need continuity can set:

litellm_settings: prometheus_latency_buckets: [0.005, 0.00625, 0.0125, 0.025, 0.05, 0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 15.0, 20.0, 25.0, 30.0, 60.0, 120.0, 180.0, 240.0, 300.0]

Rule Used: What: avoid backwards-incompatible changes without... (source)

greptile-apps · 2026-04-10T20:21:36Z

 langfuse_default_tags: Optional[List[str]] = None
 langsmith_batch_size: Optional[int] = None
 prometheus_initialize_budget_metrics: Optional[bool] = False
+prometheus_latency_buckets: Optional[List[float]] = None


No validation of user-provided bucket list

prometheus_latency_buckets is consumed as-is; Prometheus requires buckets to be strictly sorted ascending and all positive. An out-of-order or empty list will raise an unhelpful ValueError deep inside prometheus_client at logger initialisation time rather than at config load. A lightweight guard at the point of consumption would surface the problem earlier:

_custom_buckets = litellm.prometheus_latency_buckets if _custom_buckets is not None: if not _custom_buckets or list(_custom_buckets) != sorted(_custom_buckets): raise ValueError( "prometheus_latency_buckets must be a non-empty, strictly ascending list of floats" ) self.latency_buckets = tuple(_custom_buckets) else: self.latency_buckets = LATENCY_BUCKETS

greptile-apps · 2026-04-10T20:21:38Z

+    for collector in list(REGISTRY._collector_to_names.keys()):
+        try:
+            REGISTRY.unregister(collector)
+        except Exception:
+            pass
+    try:
+        litellm.prometheus_latency_buckets = custom_buckets
+        logger = PrometheusLogger()
+        assert logger.latency_buckets == tuple(custom_buckets)
+    finally:
+        litellm.prometheus_latency_buckets = original
+        for collector in list(REGISTRY._collector_to_names.keys()):
+            try:
+                REGISTRY.unregister(collector)
+            except Exception:


Test relies on private REGISTRY._collector_to_names attribute

REGISTRY._collector_to_names is an internal implementation detail of prometheus_client; its name/type has changed across minor versions and could break the test silently. The public API to clear the registry before/after a test is REGISTRY.unregister(collector) combined with iterating list(REGISTRY._names_to_collectors.values()), or better, using the library's own CollectorRegistry(auto_describe=True) and passing it to PrometheusLogger via a constructor param. If the prometheus_logger fixture already handles registry isolation, this test could simply reuse it with a monkeypatch for litellm.prometheus_latency_buckets instead of clearing the global registry.

 langfuse_default_tags: Optional[List[str]] = None
 langsmith_batch_size: Optional[int] = None
 prometheus_initialize_budget_metrics: Optional[bool] = False
+prometheus_latency_buckets: Optional[List[float]] = None


…uckets

…atency buckets" This reverts commit 1bfd004.

…uckets

greptile-apps · 2026-04-10T20:35:47Z

Tip:

Greploop — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.

Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

feat(prometheus): reduce default latency bucket cardinality and make …

f5de33a

…configurable

vercel Bot deployed to Preview April 10, 2026 20:18 View deployment

greptile-apps Bot reviewed Apr 10, 2026

View reviewed changes

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

Comment thread litellm/__init__.py

langfuse_default_tags: Optional[List[str]] = None

langsmith_batch_size: Optional[int] = None

prometheus_initialize_budget_metrics: Optional[bool] = False

prometheus_latency_buckets: Optional[List[float]] = None

test(prometheus): add coverage for PrometheusServicesLogger latency b…

1bfd004

…uckets

vercel Bot deployed to Preview April 10, 2026 20:24 View deployment

Revert "test(prometheus): add coverage for PrometheusServicesLogger l…

85997be

…atency buckets" This reverts commit 1bfd004.

vercel Bot deployed to Preview April 10, 2026 20:29 View deployment

test(prometheus): add coverage for PrometheusServicesLogger latency b…

b9657b7

…uckets

vercel Bot deployed to Preview April 10, 2026 20:32 View deployment

krrish-berri-2 changed the base branch from main to litellm_internal_staging_04_11_2026 April 11, 2026 16:34

krrish-berri-2 merged commit 7d2f069 into BerriAI:litellm_internal_staging_04_11_2026 Apr 11, 2026
49 of 51 checks passed

emerzon mentioned this pull request Apr 12, 2026

add azure ai grok 4 20 models #25582

Open

yuneng-berri mentioned this pull request Apr 14, 2026

[Docs] Add release notes for v1.83.3-stable and v1.83.7.rc.1 #25723

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce default latency histogram bucket cardinality#25527

Reduce default latency histogram bucket cardinality#25527
krrish-berri-2 merged 4 commits intoBerriAI:litellm_internal_staging_04_11_2026from
J-Byron:feat/prometheus-latency-buckets

J-Byron commented Apr 10, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

J-Byron commented Apr 10, 2026

Uh oh!

codspeed-hq Bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 10, 2026 •

edited

Loading

Important Files Changed

Uh oh!

greptile-apps Bot Apr 10, 2026

Uh oh!

greptile-apps Bot Apr 10, 2026

Uh oh!

greptile-apps Bot Apr 10, 2026

Uh oh!

greptile-apps Bot commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

J-Byron commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

Changes

Breaking Change

Uh oh!

vercel Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

J-Byron commented Apr 10, 2026

Uh oh!

codspeed-hq Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

codecov Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

J-Byron commented Apr 10, 2026 •

edited

Loading

vercel Bot commented Apr 10, 2026 •

edited

Loading

codspeed-hq Bot commented Apr 10, 2026 •

edited

Loading

codecov Bot commented Apr 10, 2026 •

edited

Loading

greptile-apps Bot commented Apr 10, 2026 •

edited

Loading