Reduce default latency histogram bucket cardinality#25527
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR reduces the default Prometheus latency histogram from 35 to 18 buckets (removing dense half-second intervals from 1–10 s, adding 420 s and 600 s boundaries) and exposes a Confidence Score: 4/5Mostly safe to merge, but the breaking default change and one test isolation issue warrant attention before merging. The implementation is clean and consistent across both logger classes. However, three concerns flagged in previous review threads remain unaddressed: the default bucket list is a breaking change for users with existing Prometheus dashboards (violating the codebase backwards-compat rule), user-provided buckets are accepted without validation, and both custom-bucket tests rely on a private registry API. A new finding — missing registry cleanup in tests/test_litellm/integrations/test_prometheus_services.py (missing registry isolation), litellm/init.py (no bucket list validation), litellm/types/integrations/prometheus.py (breaking default change)
|
| Filename | Overview |
|---|---|
| litellm/types/integrations/prometheus.py | Reduces LATENCY_BUCKETS from 35 to 18 boundaries and adds 420 s / 600 s — a breaking default change for existing Prometheus dashboards and SLO queries. |
| litellm/init.py | Adds prometheus_latency_buckets: Optional[List[float]] = None global; no input validation at the declaration site (consumed verbatim by loggers). |
| litellm/integrations/prometheus.py | Reads litellm.prometheus_latency_buckets at init time and wires it through to all six histogram metrics; clean and consistent. |
| litellm/integrations/prometheus_services.py | Adds import litellm at module level and reads prometheus_latency_buckets at init time; consistent with prometheus.py treatment. |
| tests/test_litellm/integrations/test_prometheus_services.py | New test_services_logger_default_latency_buckets creates a real PrometheusServicesLogger without registry cleanup, making it order-dependent on other tests; the custom-bucket companion test manages cleanup but not the default test. |
| tests/test_litellm/integrations/test_prometheus_user_team_metrics.py | New test_default_latency_buckets correctly uses the prometheus_logger fixture; test_custom_latency_buckets relies on private REGISTRY._collector_to_names for isolation. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[YAML config loaded\nlitellm_settings] --> B{prometheus_latency_buckets\nset?}
B -- Yes --> C[setattr litellm.prometheus_latency_buckets\nvia generic fallthrough]
B -- No --> D[litellm.prometheus_latency_buckets = None]
C --> E[PrometheusLogger.__init__]
D --> E
E --> F{_custom_buckets is not None?}
F -- Yes --> G[self.latency_buckets = tuple custom_buckets]
F -- No --> H[self.latency_buckets = LATENCY_BUCKETS\n18 boundaries incl. 420s & 600s]
G --> I[All 6 histogram metrics\nuse self.latency_buckets]
H --> I
J[PrometheusServicesLogger.__init__] --> K{_custom_buckets is not None?}
K -- Yes --> L[self.latency_buckets = tuple custom_buckets]
K -- No --> M[self.latency_buckets = LATENCY_BUCKETS]
L --> N[Service latency histograms\nuse self.latency_buckets]
M --> N
Reviews (2): Last reviewed commit: "test(prometheus): add coverage for Prome..." | Re-trigger Greptile
| LATENCY_BUCKETS = ( | ||
| 0.005, | ||
| 0.00625, | ||
| 0.0125, | ||
| 0.01, | ||
| 0.025, | ||
| 0.05, | ||
| 0.1, | ||
| 0.25, | ||
| 0.5, | ||
| 1.0, | ||
| 1.5, | ||
| 2.0, | ||
| 2.5, | ||
| 3.0, | ||
| 3.5, | ||
| 4.0, | ||
| 4.5, | ||
| 5.0, | ||
| 5.5, | ||
| 6.0, | ||
| 6.5, | ||
| 7.0, | ||
| 7.5, | ||
| 8.0, | ||
| 8.5, | ||
| 9.0, | ||
| 9.5, | ||
| 10.0, | ||
| 15.0, | ||
| 20.0, | ||
| 25.0, | ||
| 30.0, | ||
| 60.0, | ||
| 120.0, | ||
| 180.0, | ||
| 240.0, | ||
| 300.0, | ||
| 420.0, # 7 minutes | ||
| 600.0, # 10 minutes (typical default LLM request timeout) | ||
| float("inf"), | ||
| ) |
There was a problem hiding this comment.
Breaking default change removes existing
le label values
Prometheus Histogram metrics persist their _bucket time-series keyed by le label. Any dashboard or PromQL SLO expression that filters on a removed boundary — e.g. {le="1.5"}, {le="2.5"}, or any of the nine other dropped values — will return no data after this upgrade with no warning. histogram_quantile over a recording rule or a Grafana panel will also produce a jump in apparent latency distribution at the moment of rollout.
The prometheus_latency_buckets escape hatch requires users to know the change happened and reconstruct the old list manually. Per the codebase's backward-compat policy, a breaking default should either be opt-in (keep old defaults, document the new reduced set as the recommended override) or at minimum include a migration note in the changelog. Consider keeping the old LATENCY_BUCKETS as LATENCY_BUCKETS_LEGACY and exposing the new set as the recommended value, so users who need continuity can set:
litellm_settings:
prometheus_latency_buckets: [0.005, 0.00625, 0.0125, 0.025, 0.05, 0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 15.0, 20.0, 25.0, 30.0, 60.0, 120.0, 180.0, 240.0, 300.0]Rule Used: What: avoid backwards-incompatible changes without... (source)
| langfuse_default_tags: Optional[List[str]] = None | ||
| langsmith_batch_size: Optional[int] = None | ||
| prometheus_initialize_budget_metrics: Optional[bool] = False | ||
| prometheus_latency_buckets: Optional[List[float]] = None |
There was a problem hiding this comment.
No validation of user-provided bucket list
prometheus_latency_buckets is consumed as-is; Prometheus requires buckets to be strictly sorted ascending and all positive. An out-of-order or empty list will raise an unhelpful ValueError deep inside prometheus_client at logger initialisation time rather than at config load. A lightweight guard at the point of consumption would surface the problem earlier:
_custom_buckets = litellm.prometheus_latency_buckets
if _custom_buckets is not None:
if not _custom_buckets or list(_custom_buckets) != sorted(_custom_buckets):
raise ValueError(
"prometheus_latency_buckets must be a non-empty, strictly ascending list of floats"
)
self.latency_buckets = tuple(_custom_buckets)
else:
self.latency_buckets = LATENCY_BUCKETS| for collector in list(REGISTRY._collector_to_names.keys()): | ||
| try: | ||
| REGISTRY.unregister(collector) | ||
| except Exception: | ||
| pass | ||
| try: | ||
| litellm.prometheus_latency_buckets = custom_buckets | ||
| logger = PrometheusLogger() | ||
| assert logger.latency_buckets == tuple(custom_buckets) | ||
| finally: | ||
| litellm.prometheus_latency_buckets = original | ||
| for collector in list(REGISTRY._collector_to_names.keys()): | ||
| try: | ||
| REGISTRY.unregister(collector) | ||
| except Exception: |
There was a problem hiding this comment.
Test relies on private
REGISTRY._collector_to_names attribute
REGISTRY._collector_to_names is an internal implementation detail of prometheus_client; its name/type has changed across minor versions and could break the test silently. The public API to clear the registry before/after a test is REGISTRY.unregister(collector) combined with iterating list(REGISTRY._names_to_collectors.values()), or better, using the library's own CollectorRegistry(auto_describe=True) and passing it to PrometheusLogger via a constructor param. If the prometheus_logger fixture already handles registry isolation, this test could simply reuse it with a monkeypatch for litellm.prometheus_latency_buckets instead of clearing the global registry.
| langfuse_default_tags: Optional[List[str]] = None | ||
| langsmith_batch_size: Optional[int] = None | ||
| prometheus_initialize_budget_metrics: Optional[bool] = False | ||
| prometheus_latency_buckets: Optional[List[float]] = None |
…atency buckets" This reverts commit 1bfd004.
|
Tip: Greploop — Automatically fix all review issues by running Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal. |
7d2f069
into
BerriAI:litellm_internal_staging_04_11_2026
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer review🆕 New Feature
Changes
Default Prometheus latency histogram had 35 buckets with dense 0.5s granularity between 1–10s. With multiple label dimensions (model, api_key, team, etc.), this creates excessive time series cardinality.
Reduced default buckets from 35 to 18 by removing the half-second intervals in the middle, added 420s and 600s boundaries to cover typical LLM timeout ranges, and made the bucket list configurable via
prometheus_latency_bucketsinlitellm_settingsso users can tune for their own cardinality needs.Breaking Change
The default
LATENCY_BUCKETShas been reduced from 35 to 18 boundaries. If you have existing Prometheus dashboards or PromQL SLO queries that reference specificlevalues (e.g.le="1.5",le="9.5"), those series will no longer exist after upgrading.To restore the previous buckets, add this to your
config.yaml: