Skip to content

Reduce default latency histogram bucket cardinality#25527

Merged
krrish-berri-2 merged 4 commits intoBerriAI:litellm_internal_staging_04_11_2026from
J-Byron:feat/prometheus-latency-buckets
Apr 11, 2026
Merged

Reduce default latency histogram bucket cardinality#25527
krrish-berri-2 merged 4 commits intoBerriAI:litellm_internal_staging_04_11_2026from
J-Byron:feat/prometheus-latency-buckets

Conversation

@J-Byron
Copy link
Copy Markdown
Contributor

@J-Byron J-Byron commented Apr 10, 2026

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

🆕 New Feature

Changes

Default Prometheus latency histogram had 35 buckets with dense 0.5s granularity between 1–10s. With multiple label dimensions (model, api_key, team, etc.), this creates excessive time series cardinality.

Reduced default buckets from 35 to 18 by removing the half-second intervals in the middle, added 420s and 600s boundaries to cover typical LLM timeout ranges, and made the bucket list configurable via prometheus_latency_buckets in litellm_settings so users can tune for their own cardinality needs.

Breaking Change

The default LATENCY_BUCKETS has been reduced from 35 to 18 boundaries. If you have existing Prometheus dashboards or PromQL SLO queries that reference specific le values (e.g. le="1.5", le="9.5"), those series will no longer exist after upgrading.

To restore the previous buckets, add this to your config.yaml:

litellm_settings:
  prometheus_latency_buckets: [0.005, 0.00625, 0.0125, 0.025, 0.05, 0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 15.0, 20.0, 25.0, 30.0, 60.0, 120.0, 180.0, 240.0, 300.0]
Screenshot 2026-04-10 at 4 14 36 PM

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 10, 2026 8:32pm

Request Review

@J-Byron
Copy link
Copy Markdown
Contributor Author

J-Byron commented Apr 10, 2026

@greptileai

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented Apr 10, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing J-Byron:feat/prometheus-latency-buckets (b9657b7) with main (9e4352a)

Open in CodSpeed

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 10, 2026

Greptile Summary

This PR reduces the default Prometheus latency histogram from 35 to 18 buckets (removing dense half-second intervals from 1–10 s, adding 420 s and 600 s boundaries) and exposes a prometheus_latency_buckets key in litellm_settings to let users override the defaults. The changes are applied consistently across both PrometheusLogger and PrometheusServicesLogger, and two new test suites cover the default and custom-bucket paths.

Confidence Score: 4/5

Mostly safe to merge, but the breaking default change and one test isolation issue warrant attention before merging.

The implementation is clean and consistent across both logger classes. However, three concerns flagged in previous review threads remain unaddressed: the default bucket list is a breaking change for users with existing Prometheus dashboards (violating the codebase backwards-compat rule), user-provided buckets are accepted without validation, and both custom-bucket tests rely on a private registry API. A new finding — missing registry cleanup in test_services_logger_default_latency_buckets — introduces a P1 test-ordering fragility. Together these keep the score at 4/5.

tests/test_litellm/integrations/test_prometheus_services.py (missing registry isolation), litellm/init.py (no bucket list validation), litellm/types/integrations/prometheus.py (breaking default change)

Important Files Changed

Filename Overview
litellm/types/integrations/prometheus.py Reduces LATENCY_BUCKETS from 35 to 18 boundaries and adds 420 s / 600 s — a breaking default change for existing Prometheus dashboards and SLO queries.
litellm/init.py Adds prometheus_latency_buckets: Optional[List[float]] = None global; no input validation at the declaration site (consumed verbatim by loggers).
litellm/integrations/prometheus.py Reads litellm.prometheus_latency_buckets at init time and wires it through to all six histogram metrics; clean and consistent.
litellm/integrations/prometheus_services.py Adds import litellm at module level and reads prometheus_latency_buckets at init time; consistent with prometheus.py treatment.
tests/test_litellm/integrations/test_prometheus_services.py New test_services_logger_default_latency_buckets creates a real PrometheusServicesLogger without registry cleanup, making it order-dependent on other tests; the custom-bucket companion test manages cleanup but not the default test.
tests/test_litellm/integrations/test_prometheus_user_team_metrics.py New test_default_latency_buckets correctly uses the prometheus_logger fixture; test_custom_latency_buckets relies on private REGISTRY._collector_to_names for isolation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[YAML config loaded\nlitellm_settings] --> B{prometheus_latency_buckets\nset?}
    B -- Yes --> C[setattr litellm.prometheus_latency_buckets\nvia generic fallthrough]
    B -- No --> D[litellm.prometheus_latency_buckets = None]
    C --> E[PrometheusLogger.__init__]
    D --> E
    E --> F{_custom_buckets is not None?}
    F -- Yes --> G[self.latency_buckets = tuple custom_buckets]
    F -- No --> H[self.latency_buckets = LATENCY_BUCKETS\n18 boundaries incl. 420s & 600s]
    G --> I[All 6 histogram metrics\nuse self.latency_buckets]
    H --> I
    J[PrometheusServicesLogger.__init__] --> K{_custom_buckets is not None?}
    K -- Yes --> L[self.latency_buckets = tuple custom_buckets]
    K -- No --> M[self.latency_buckets = LATENCY_BUCKETS]
    L --> N[Service latency histograms\nuse self.latency_buckets]
    M --> N
Loading

Reviews (2): Last reviewed commit: "test(prometheus): add coverage for Prome..." | Re-trigger Greptile

Comment on lines 123 to 142
LATENCY_BUCKETS = (
0.005,
0.00625,
0.0125,
0.01,
0.025,
0.05,
0.1,
0.25,
0.5,
1.0,
1.5,
2.0,
2.5,
3.0,
3.5,
4.0,
4.5,
5.0,
5.5,
6.0,
6.5,
7.0,
7.5,
8.0,
8.5,
9.0,
9.5,
10.0,
15.0,
20.0,
25.0,
30.0,
60.0,
120.0,
180.0,
240.0,
300.0,
420.0, # 7 minutes
600.0, # 10 minutes (typical default LLM request timeout)
float("inf"),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Breaking default change removes existing le label values

Prometheus Histogram metrics persist their _bucket time-series keyed by le label. Any dashboard or PromQL SLO expression that filters on a removed boundary — e.g. {le="1.5"}, {le="2.5"}, or any of the nine other dropped values — will return no data after this upgrade with no warning. histogram_quantile over a recording rule or a Grafana panel will also produce a jump in apparent latency distribution at the moment of rollout.

The prometheus_latency_buckets escape hatch requires users to know the change happened and reconstruct the old list manually. Per the codebase's backward-compat policy, a breaking default should either be opt-in (keep old defaults, document the new reduced set as the recommended override) or at minimum include a migration note in the changelog. Consider keeping the old LATENCY_BUCKETS as LATENCY_BUCKETS_LEGACY and exposing the new set as the recommended value, so users who need continuity can set:

litellm_settings:
  prometheus_latency_buckets: [0.005, 0.00625, 0.0125, 0.025, 0.05, 0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 15.0, 20.0, 25.0, 30.0, 60.0, 120.0, 180.0, 240.0, 300.0]

Rule Used: What: avoid backwards-incompatible changes without... (source)

Comment thread litellm/__init__.py
langfuse_default_tags: Optional[List[str]] = None
langsmith_batch_size: Optional[int] = None
prometheus_initialize_budget_metrics: Optional[bool] = False
prometheus_latency_buckets: Optional[List[float]] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No validation of user-provided bucket list

prometheus_latency_buckets is consumed as-is; Prometheus requires buckets to be strictly sorted ascending and all positive. An out-of-order or empty list will raise an unhelpful ValueError deep inside prometheus_client at logger initialisation time rather than at config load. A lightweight guard at the point of consumption would surface the problem earlier:

_custom_buckets = litellm.prometheus_latency_buckets
if _custom_buckets is not None:
    if not _custom_buckets or list(_custom_buckets) != sorted(_custom_buckets):
        raise ValueError(
            "prometheus_latency_buckets must be a non-empty, strictly ascending list of floats"
        )
    self.latency_buckets = tuple(_custom_buckets)
else:
    self.latency_buckets = LATENCY_BUCKETS

Comment on lines +794 to +808
for collector in list(REGISTRY._collector_to_names.keys()):
try:
REGISTRY.unregister(collector)
except Exception:
pass
try:
litellm.prometheus_latency_buckets = custom_buckets
logger = PrometheusLogger()
assert logger.latency_buckets == tuple(custom_buckets)
finally:
litellm.prometheus_latency_buckets = original
for collector in list(REGISTRY._collector_to_names.keys()):
try:
REGISTRY.unregister(collector)
except Exception:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test relies on private REGISTRY._collector_to_names attribute

REGISTRY._collector_to_names is an internal implementation detail of prometheus_client; its name/type has changed across minor versions and could break the test silently. The public API to clear the registry before/after a test is REGISTRY.unregister(collector) combined with iterating list(REGISTRY._names_to_collectors.values()), or better, using the library's own CollectorRegistry(auto_describe=True) and passing it to PrometheusLogger via a constructor param. If the prometheus_logger fixture already handles registry isolation, this test could simply reuse it with a monkeypatch for litellm.prometheus_latency_buckets instead of clearing the global registry.

Comment thread litellm/__init__.py
langfuse_default_tags: Optional[List[str]] = None
langsmith_batch_size: Optional[int] = None
prometheus_initialize_budget_metrics: Optional[bool] = False
prometheus_latency_buckets: Optional[List[float]] = None
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 10, 2026

Tip:

Greploop — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.

Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

@krrish-berri-2 krrish-berri-2 changed the base branch from main to litellm_internal_staging_04_11_2026 April 11, 2026 16:34
@krrish-berri-2 krrish-berri-2 merged commit 7d2f069 into BerriAI:litellm_internal_staging_04_11_2026 Apr 11, 2026
49 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants