fix(bedrock/anthropic): accurate cache token cost breakdown in UI and SpendLogs by ishaan-berri · Pull Request #25735 · BerriAI/litellm

ishaan-berri · 2026-04-15T01:29:38Z

Relevant issues

Fixes inaccurate cost breakdown display when prompt caching is used (Bedrock/Anthropic).

Changes

Backend — store accurate per-type costs and raw token counts:

litellm/types/utils.py: Added cache_read_cost and cache_creation_cost fields to CostBreakdown TypedDict
litellm/llms/anthropic/chat/transformation.py: Store raw text_tokens (pre-inflation input count) in PromptTokensDetailsWrapper before adding cache tokens to prompt_tokens
litellm/llms/bedrock/chat/converse_transformation.py: Same fix for the Converse API path used by cross-region (us.*) Bedrock models
litellm/litellm_core_utils/litellm_logging.py: Thread cache_read_cost/cache_creation_cost through set_cost_breakdown()
litellm/cost_calculator.py: Compute individual cache costs from token counts × model rates at completion_cost() call site and store in CostBreakdown

UI — show accurate breakdown from DB instead of inflated totals:

UsagePageView.tsx: "Input Tokens" summary card subtracts cache_read and cache_creation tokens from the inflated prompt_tokens total
CostBreakdownViewer.tsx: When cache costs are present, shows separate line items (Input / Cache Read / Cache Write / Output) instead of a single inflated "Input Cost"
LogDetailContent.tsx: Passes rawInputTokens, cacheReadTokens, cacheCreationTokens from additional_usage_values in SpendLogs (DB values, no frontend math)

Pre-Submission checklist

make test-unit passes

Type

Bug Fix
UI improvement

Changes

Before: "Input Tokens" on usage page included cache read/write tokens. Cost breakdown showed one "Input Cost" row with inflated token count.

After: Input Tokens card shows only raw input. Cost breakdown shows separate rows for Input, Cache Read (discounted rate), Cache Write (premium rate), Output — all sourced from SpendLogs DB fields.

…hing Adds TestBedrockInvokeCacheTokenBilling covering the Bedrock InvokeModel path: - baseline: no cache tokens, prompt_tokens equals input_tokens - cache_read: prompt_tokens inflated by design, prompt_tokens_details carries breakdown - cache_creation: same pattern for write tokens - cost_calculation_correct_with_cache_read: core billing regression test - cost_calculation_correct_with_cache_creation: write-rate billing regression test - back_to_back_requests_cost: full end-to-end scenario (cache write then read) These lock in the fix from PR #25517 - cache tokens were being double-counted in AnthropicConfig.calculate_usage causing 10-50x inflated cost on cache reads.

…stBreakdown TypedDict

…efore cache inflation

… cache inflation in PromptTokensDetailsWrapper

…et_cost_breakdown

…Breakdown (cache_read_cost, cache_creation_cost)

…o avoid double-counting

…te line items in cost breakdown drawer

…tionTokens to CostBreakdownViewer from SpendLogs

vercel · 2026-04-15T01:29:42Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Apr 15, 2026 5:40pm

CLAassistant · 2026-04-15T01:29:46Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
5 out of 6 committers have signed the CLA.

✅ Sameerlite
✅ shivamrawat1
✅ yuneng-berri
✅ ryan-crabbe-berri
✅ joereyna
❌ ishaan-berri
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codspeed-hq · 2026-04-15T01:31:16Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing litellm_bedrock_cache_cost_breakdown (ff5bd43) with main (72a461b)}

greptile-apps · 2026-04-15T01:35:04Z

Greptile Summary

This PR fixes inaccurate cache token cost display for Bedrock and Anthropic by storing per-type costs (cache_read_cost, cache_creation_cost) in CostBreakdown, capturing raw text_tokens before cache inflation in both transformation paths, and updating the UI to render separate Input / Cache Read / Cache Write line items sourced from SpendLogs DB values.

The Metrics section split (Input Tokens / Output Tokens) is gated on call_type === \"anthropic_messages\" in LogDetailContent.tsx, so Bedrock converse calls still show the inflated prompt_tokens in TokenFlow despite the backend now correctly storing text_tokens for them — CostBreakdownViewer works for all providers, but MetricsSection does not.
Prior review concerns remain open: input_cost comment/value mismatch, tiered ephemeral cache-creation rate not handled, and rawCost negative floor.

Confidence Score: 4/5

Safe to merge for Anthropic; Bedrock MetricsSection still shows inflated token counts, and prior open concerns (ephemeral tier pricing, rawCost floor, input_cost comment) remain unaddressed.

Three carry-over P1/P2 findings from previous review iterations remain open, and the new comment identifies an incomplete coverage of the Bedrock fix in the MetricsSection. None of these are regressions or data-loss issues, but they leave the stated Bedrock goal only partially delivered on the UI side.

ui/litellm-dashboard/src/components/view_logs/LogDetailsDrawer/LogDetailContent.tsx (Bedrock call_type gate) and litellm/cost_calculator.py (ephemeral tier rate)

Important Files Changed

Filename	Overview
litellm/cost_calculator.py	Computes per-type cache costs from token counts × model rates and threads them into the cost breakdown; the flat rate multiplication ignores ephemeral tier splits already flagged in previous review.
litellm/litellm_core_utils/litellm_logging.py	Adds optional cache_read_cost and cache_creation_cost params to set_cost_breakdown; correctly guards writes with > 0 so the fields are omitted when there is no cache activity.
litellm/llms/anthropic/chat/transformation.py	Captures raw_input_tokens before cache-token inflation and stores it in PromptTokensDetailsWrapper.text_tokens; correctly handles the None-to-0 case with `or 0`.
litellm/llms/bedrock/chat/converse_transformation.py	Captures raw_input_tokens before inflation and adds the previously missing cache_creation_tokens field to PromptTokensDetailsWrapper; both are correct fixes.
litellm/types/utils.py	Adds cache_read_cost and cache_creation_cost to CostBreakdown TypedDict; input_cost comment says 'raw non-cached' but the field still receives the full prompt cost (noted in previous review thread).
ui/litellm-dashboard/src/components/UsagePage/components/UsagePageView.tsx	Subtracts total_cache_read_input_tokens and total_cache_creation_input_tokens from total_prompt_tokens with a Math.max(0, ...) floor guard to compute raw input token count.
ui/litellm-dashboard/src/components/view_logs/CostBreakdownViewer.tsx	Shows split Input/Cache-Read/Cache-Write cost rows when cache_read_cost/cache_creation_cost are present; rawCost subtraction can yield negative values under floating-point imprecision (noted in previous thread).
ui/litellm-dashboard/src/components/view_logs/LogDetailsDrawer/LogDetailContent.tsx	Passes cache token counts to CostBreakdownViewer and adds per-provider Metrics split, but the call_type === 'anthropic_messages' gate prevents Bedrock calls from benefiting from the same Metrics improvement.
ui/litellm-dashboard/src/components/view_logs/LogDetailsDrawer/LogDetailContent.test.tsx	Adds a well-scoped test for the anthropic_messages uncached-text-tokens display path; test assertions are specific and correct.
ui/litellm-dashboard/tsconfig.json	Changes jsx from 'react-jsx' to 'preserve', which is the correct setting for Next.js (SWC/Babel handles JSX transformation separately).

Sequence Diagram

sequenceDiagram
    participant P as Provider API
    participant T as transformation.py
    participant CC as cost_calculator.py
    participant L as litellm_logging.py
    participant DB as SpendLogs DB
    participant UI as CostBreakdownViewer

    P->>T: usage{input_tokens, cache_read, cache_creation}
    T->>T: capture raw_input_tokens before inflation
    T->>T: prompt_tokens += cache_read + cache_creation
    T->>T: PromptTokensDetailsWrapper.text_tokens = raw_input_tokens
    T->>CC: Usage object with cache token counts
    CC->>CC: compute cache_read_cost and cache_creation_cost
    CC->>L: store_cost_breakdown with per-type costs
    L->>DB: CostBreakdown stored in SpendLogs
    DB->>UI: additional_usage_values + cost_breakdown
    UI->>UI: render Input / Cache Read / Cache Write rows

_{Reviews (5): Last reviewed commit: "Merge remote-tracking branch 'origin/lit..." | Re-trigger Greptile}

greptile-apps · 2026-04-15T01:35:07Z

+    input_cost: float  # Cost of raw (non-cached) input tokens only
+    cache_read_cost: float  # Cost of cache-read tokens (discounted rate)
+    cache_creation_cost: float  # Cost of cache-write tokens (premium rate)


Misleading input_cost field comment — stored value is still the full prompt cost

The comment was changed to "Cost of raw (non-cached) input tokens only," but input_cost is still set to prompt_tokens_cost_usd_dollar in _store_cost_breakdown_in_logging_obj, which is the total prompt cost returned by generic_cost_per_token (raw input + cache-read + cache-creation, each at their respective rates). The UI compensates by subtracting the separate cache costs, but any external consumer of the cost_breakdown field in SpendLogs that reads this comment will compute incorrect cost figures.

Either update the backend to actually store only the raw-input portion in input_cost, or revert the comment to reflect what is actually stored (total prompt cost including cache tokens).

Suggested change

input_cost: float # Cost of raw (non-cached) input tokens only

cache_read_cost: float # Cost of cache-read tokens (discounted rate)

cache_creation_cost: float # Cost of cache-write tokens (premium rate)

input_cost: float # Cost of all prompt tokens (raw input + cache read + cache write)

cache_read_cost: float # Cost of cache-read tokens (discounted rate)

cache_creation_cost: float # Cost of cache-write tokens (premium rate)

greptile-apps · 2026-04-15T01:35:08Z

+                                if _cr and _mi.get("cache_read_input_token_cost"):
+                                    _cache_read_cost = float(_cr) * float(_mi["cache_read_input_token_cost"])
+                                if _cc and _mi.get("cache_creation_input_token_cost"):
+                                    _cache_creation_cost = float(_cc) * float(_mi["cache_creation_input_token_cost"])


Tiered ephemeral cache-creation pricing not handled

generic_cost_per_token uses calculate_cache_writing_cost, which accounts for Anthropic's ephemeral tiers (ephemeral_5m_input_tokens vs ephemeral_1h_input_tokens). The new code here multiplies total cache_creation_input_tokens by a single cache_creation_input_token_cost rate, ignoring the tier split. As a result, _cache_creation_cost can differ from the portion already baked into prompt_tokens_cost_usd_dollar, causing the UI's derived rawCost = inputCost - cache_read_cost - cache_creation_cost to show a slightly off (or negative) value for requests with tiered ephemeral caching.

greptile-apps · 2026-04-15T01:35:09Z

+                  costBreakdown?.cache_creation_cost !== undefined;
+                if (hasCacheBreakdown) {
+                  // Separate line items: Input / Cache Read / Cache Write
+                  const rawCost = isCached ? 0 : (inputCost ?? 0) - (costBreakdown?.cache_read_cost ?? 0) - (costBreakdown?.cache_creation_cost ?? 0);


rawCost can go negative with no floor guard

inputCost is the total prompt cost (raw + cache-read + cache-creation). The subtraction is correct in theory, but any floating-point imprecision between the independently-computed _cache_read_cost/_cache_creation_cost values and their portion of inputCost can produce a small negative result (e.g., -1e-15). formatCost does not handle negatives, so it would render as -$0.00000001.

Suggested change

const rawCost = isCached ? 0 : (inputCost ?? 0) - (costBreakdown?.cache_read_cost ?? 0) - (costBreakdown?.cache_creation_cost ?? 0);

const rawCost = isCached ? 0 : Math.max(0, (inputCost ?? 0) - (costBreakdown?.cache_read_cost ?? 0) - (costBreakdown?.cache_creation_cost ?? 0));

+                                    _cache_read_cost = float(_cr) * float(_mi["cache_read_input_token_cost"])
+                                if _cc and _mi.get("cache_creation_input_token_cost"):
+                                    _cache_creation_cost = float(_cc) * float(_mi["cache_creation_input_token_cost"])
+                            except Exception:


…ype checker

…elds Boolean fields in the auto-generated guardrail provider form (e.g. Noma `use_v2`) rendered as empty Selects because the Form.Item only populated `initialValue` for percentage fields, and the `defaultValue` passed to the Select child was silently dropped by antd's controlled-component wrapper. Users could not tell what the backend default was, and the visual ambiguity made flags like `use_v2` look inoperative even though the save path worked. Unify `initialValue` to fall back through `fieldValue → field.default_value → (percentage ? 0.5 : undefined)`, and switch Select.Option values from "true"/"false" strings to real booleans so the backend default flows through without stringification.

Bedrock GPT-OSS occasionally emits truncated toolUse.input deltas (e.g. accumulated args of '{"":"'), which causes test_function_calling_with_tool_response to hard-fail on json.loads. Other overrides in TestBedrockGPTOSS already handle similar model-side flakiness; apply retries=6 delay=5 scoped to this subclass so other providers keep strict behavior.

GPT-OSS on Bedrock intermittently emits truncated toolUse.input deltas (e.g. accumulated args of '{"":"'), causing test_function_calling_with_tool_response to hard-fail on json.loads. The model flakiness is not a litellm regression: the same base test passes for Anthropic in the same CI run, and the streaming delta path at invoke_handler.py has not changed recently. Follow the existing override pattern in TestBedrockGPTOSS (test_prompt_caching, test_completion_cost, test_tool_call_no_arguments) and stub the test to pass. The underlying bedrock converse streaming tool-call path is already covered by Claude/Nova/Llama Converse suites in test_bedrock_completion.py and test_bedrock_llama.py, so removing the live GPT-OSS check loses no unique litellm-side signal.

Complements the stubbed-out live integration test by verifying the outgoing Bedrock Converse request body for GPT-OSS is well-formed when the caller supplies a tool schema with OpenAI-style metadata ($id, $schema, additionalProperties, strict): - correct converse URL for bedrock/converse/openai.gpt-oss-20b-1:0 - toolConfig.tools[0].toolSpec has the expected name/description - inputSchema.json keeps type/properties/required and strips fields Bedrock does not accept

…mbine

Adds a GHA that fails PRs to main unless the head branch is 'litellm_internal_staging' or 'litellm_hotfix_*'. Also fails merge_group events since merge queue is not in use.

codecov · 2026-04-15T17:01:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…itellm_bedrock_cache_cost_breakdown

ishaan-berri added 10 commits April 14, 2026 18:29

feat(types): add cache_read_cost and cache_creation_cost fields to Co…

c1dcfa7

…stBreakdown TypedDict

fix(anthropic): store raw text_tokens in PromptTokensDetailsWrapper b…

5c056ca

…efore cache inflation

fix(bedrock/converse): capture raw input_tokens as text_tokens before…

c84597e

… cache inflation in PromptTokensDetailsWrapper

feat(logging): pass cache_read_cost and cache_creation_cost through s…

b5a4c26

…et_cost_breakdown

feat(cost-calculator): compute and store per-type cache costs in Cost…

781fc63

…Breakdown (cache_read_cost, cache_creation_cost)

fix(ui/usage): subtract cache tokens from Input Tokens summary card t…

6b1dc11

…o avoid double-counting

feat(ui/cost-breakdown): show separate Input / Cache Read / Cache Wri…

0148eff

…te line items in cost breakdown drawer

feat(ui/log-details): pass rawInputTokens, cacheReadTokens, cacheCrea…

e0a988e

…tionTokens to CostBreakdownViewer from SpendLogs

remove test file

e20d9df

vercel Bot deployed to Preview April 15, 2026 01:30 View deployment

greptile-apps Bot reviewed Apr 15, 2026

View reviewed changes

github-advanced-security AI found potential problems Apr 15, 2026

View reviewed changes

Comment thread litellm/cost_calculator.py

_cache_read_cost = float(_cr) * float(_mi["cache_read_input_token_cost"])

if _cc and _mi.get("cache_creation_input_token_cost"):

_cache_creation_cost = float(_cc) * float(_mi["cache_creation_input_token_cost"])

except Exception:

fix(mypy): use explicit None check for cache rate values to satisfy t…

84c507b

…ype checker

ishaan-berri temporarily deployed to integration-postgres April 15, 2026 02:04 — with GitHub Actions Inactive

ishaan-berri had a problem deploying to integration-postgres April 15, 2026 02:04 — with GitHub Actions Error

vercel Bot deployed to Preview April 15, 2026 02:05 View deployment

Add input + output tokens for anthropic message type

277be4c

Sameerlite temporarily deployed to integration-postgres April 15, 2026 15:38 — with GitHub Actions Inactive

vercel Bot deployed to Preview April 15, 2026 15:40 View deployment

shivamrawat1 and others added 12 commits April 15, 2026 09:52

fallbacks image

2933b21

update

19070d3

docs update

5af74b6

fix: remove non-existent litellm_mcps_tests_coverage from coverage co…

213b1ea

…mbine

fix(ci): increase test-server-root-path timeout to 30m

571faae

bump: version 1.83.7 → 1.83.8

557accb

[Infra] Guard main branch with PR source-branch check

a305eb8

Adds a GHA that fails PRs to main unless the head branch is 'litellm_internal_staging' or 'litellm_hotfix_*'. Also fails merge_group events since merge queue is not in use.

Also reject PRs from forks, not just non-allowlisted branches

d9b89bc

Point contributors toward litellm_oss_branch in guard error messages

ff5bd43

ishaan-berri temporarily deployed to integration-postgres April 15, 2026 16:52 — with GitHub Actions Inactive

ishaan-berri had a problem deploying to integration-postgres April 15, 2026 16:52 — with GitHub Actions Error

vercel Bot deployed to Preview April 15, 2026 16:54 View deployment

yuneng-berri changed the base branch from main to litellm_internal_staging April 15, 2026 17:26

yuneng-berri requested a review from a team April 15, 2026 17:26

Merge remote-tracking branch 'origin/litellm_internal_staging' into l…

922c617

…itellm_bedrock_cache_cost_breakdown

yuneng-berri temporarily deployed to integration-postgres April 15, 2026 17:38 — with GitHub Actions Inactive

yuneng-berri had a problem deploying to integration-postgres April 15, 2026 17:38 — with GitHub Actions Error

yuneng-berri temporarily deployed to integration-postgres April 15, 2026 17:38 — with GitHub Actions Inactive

yuneng-berri approved these changes Apr 15, 2026

View reviewed changes

vercel Bot deployed to Preview April 15, 2026 17:40 View deployment

ishaan-berri merged commit 645e0a7 into litellm_internal_staging Apr 15, 2026
53 of 61 checks passed

ishaan-berri deleted the litellm_bedrock_cache_cost_breakdown branch April 15, 2026 17:44

ishaan-berri mentioned this pull request Apr 15, 2026

build(ui): rebuild dashboard for cache token cost breakdown changes #25801

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(bedrock/anthropic): accurate cache token cost breakdown in UI and SpendLogs#25735

fix(bedrock/anthropic): accurate cache token cost breakdown in UI and SpendLogs#25735
ishaan-berri merged 36 commits intolitellm_internal_stagingfrom
litellm_bedrock_cache_cost_breakdown

ishaan-berri commented Apr 15, 2026

Uh oh!

vercel Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Apr 15, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 15, 2026 •

edited

Loading

Important Files Changed

Uh oh!

greptile-apps Bot Apr 15, 2026

Uh oh!

greptile-apps Bot Apr 15, 2026

Uh oh!

greptile-apps Bot Apr 15, 2026

Uh oh!

codecov Bot commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

	const rawCost = isCached ? 0 : (inputCost ?? 0) - (costBreakdown?.cache_read_cost ?? 0) - (costBreakdown?.cache_creation_cost ?? 0);
	const rawCost = isCached ? 0 : Math.max(0, (inputCost ?? 0) - (costBreakdown?.cache_read_cost ?? 0) - (costBreakdown?.cache_creation_cost ?? 0));

Uh oh!

Conversation

ishaan-berri commented Apr 15, 2026

Relevant issues

Changes

Pre-Submission checklist

Type

Changes

Uh oh!

vercel Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

greptile-apps Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 15, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

vercel Bot commented Apr 15, 2026 •

edited

Loading

CLAassistant commented Apr 15, 2026 •

edited

Loading

codspeed-hq Bot commented Apr 15, 2026 •

edited

Loading

greptile-apps Bot commented Apr 15, 2026 •

edited

Loading