Skip to content

fix(bedrock/anthropic): accurate cache token cost breakdown in UI and SpendLogs#25735

Merged
ishaan-berri merged 36 commits intolitellm_internal_stagingfrom
litellm_bedrock_cache_cost_breakdown
Apr 15, 2026
Merged

fix(bedrock/anthropic): accurate cache token cost breakdown in UI and SpendLogs#25735
ishaan-berri merged 36 commits intolitellm_internal_stagingfrom
litellm_bedrock_cache_cost_breakdown

Conversation

@ishaan-berri
Copy link
Copy Markdown
Contributor

Relevant issues

Fixes inaccurate cost breakdown display when prompt caching is used (Bedrock/Anthropic).

Changes

Backend — store accurate per-type costs and raw token counts:

  • litellm/types/utils.py: Added cache_read_cost and cache_creation_cost fields to CostBreakdown TypedDict
  • litellm/llms/anthropic/chat/transformation.py: Store raw text_tokens (pre-inflation input count) in PromptTokensDetailsWrapper before adding cache tokens to prompt_tokens
  • litellm/llms/bedrock/chat/converse_transformation.py: Same fix for the Converse API path used by cross-region (us.*) Bedrock models
  • litellm/litellm_core_utils/litellm_logging.py: Thread cache_read_cost/cache_creation_cost through set_cost_breakdown()
  • litellm/cost_calculator.py: Compute individual cache costs from token counts × model rates at completion_cost() call site and store in CostBreakdown

UI — show accurate breakdown from DB instead of inflated totals:

  • UsagePageView.tsx: "Input Tokens" summary card subtracts cache_read and cache_creation tokens from the inflated prompt_tokens total
  • CostBreakdownViewer.tsx: When cache costs are present, shows separate line items (Input / Cache Read / Cache Write / Output) instead of a single inflated "Input Cost"
  • LogDetailContent.tsx: Passes rawInputTokens, cacheReadTokens, cacheCreationTokens from additional_usage_values in SpendLogs (DB values, no frontend math)

Pre-Submission checklist

  • make test-unit passes

Type

  • Bug Fix
  • UI improvement

Changes

Before: "Input Tokens" on usage page included cache read/write tokens. Cost breakdown showed one "Input Cost" row with inflated token count.

After: Input Tokens card shows only raw input. Cost breakdown shows separate rows for Input, Cache Read (discounted rate), Cache Write (premium rate), Output — all sourced from SpendLogs DB fields.

…hing

Adds TestBedrockInvokeCacheTokenBilling covering the Bedrock InvokeModel path:
- baseline: no cache tokens, prompt_tokens equals input_tokens
- cache_read: prompt_tokens inflated by design, prompt_tokens_details carries breakdown
- cache_creation: same pattern for write tokens
- cost_calculation_correct_with_cache_read: core billing regression test
- cost_calculation_correct_with_cache_creation: write-rate billing regression test
- back_to_back_requests_cost: full end-to-end scenario (cache write then read)

These lock in the fix from PR #25517 - cache tokens were being double-counted
in AnthropicConfig.calculate_usage causing 10-50x inflated cost on cache reads.
… cache inflation in PromptTokensDetailsWrapper
…Breakdown (cache_read_cost, cache_creation_cost)
…tionTokens to CostBreakdownViewer from SpendLogs
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 15, 2026 5:40pm

Request Review

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 15, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
5 out of 6 committers have signed the CLA.

✅ Sameerlite
✅ shivamrawat1
✅ yuneng-berri
✅ ryan-crabbe-berri
✅ joereyna
❌ ishaan-berri
You have signed the CLA already but the status is still pending? Let us recheck it.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented Apr 15, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_bedrock_cache_cost_breakdown (ff5bd43) with main (72a461b)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 15, 2026

Greptile Summary

This PR fixes inaccurate cache token cost display for Bedrock and Anthropic by storing per-type costs (cache_read_cost, cache_creation_cost) in CostBreakdown, capturing raw text_tokens before cache inflation in both transformation paths, and updating the UI to render separate Input / Cache Read / Cache Write line items sourced from SpendLogs DB values.

  • The Metrics section split (Input Tokens / Output Tokens) is gated on call_type === \"anthropic_messages\" in LogDetailContent.tsx, so Bedrock converse calls still show the inflated prompt_tokens in TokenFlow despite the backend now correctly storing text_tokens for them — CostBreakdownViewer works for all providers, but MetricsSection does not.
  • Prior review concerns remain open: input_cost comment/value mismatch, tiered ephemeral cache-creation rate not handled, and rawCost negative floor.

Confidence Score: 4/5

Safe to merge for Anthropic; Bedrock MetricsSection still shows inflated token counts, and prior open concerns (ephemeral tier pricing, rawCost floor, input_cost comment) remain unaddressed.

Three carry-over P1/P2 findings from previous review iterations remain open, and the new comment identifies an incomplete coverage of the Bedrock fix in the MetricsSection. None of these are regressions or data-loss issues, but they leave the stated Bedrock goal only partially delivered on the UI side.

ui/litellm-dashboard/src/components/view_logs/LogDetailsDrawer/LogDetailContent.tsx (Bedrock call_type gate) and litellm/cost_calculator.py (ephemeral tier rate)

Important Files Changed

Filename Overview
litellm/cost_calculator.py Computes per-type cache costs from token counts × model rates and threads them into the cost breakdown; the flat rate multiplication ignores ephemeral tier splits already flagged in previous review.
litellm/litellm_core_utils/litellm_logging.py Adds optional cache_read_cost and cache_creation_cost params to set_cost_breakdown; correctly guards writes with > 0 so the fields are omitted when there is no cache activity.
litellm/llms/anthropic/chat/transformation.py Captures raw_input_tokens before cache-token inflation and stores it in PromptTokensDetailsWrapper.text_tokens; correctly handles the None-to-0 case with or 0.
litellm/llms/bedrock/chat/converse_transformation.py Captures raw_input_tokens before inflation and adds the previously missing cache_creation_tokens field to PromptTokensDetailsWrapper; both are correct fixes.
litellm/types/utils.py Adds cache_read_cost and cache_creation_cost to CostBreakdown TypedDict; input_cost comment says 'raw non-cached' but the field still receives the full prompt cost (noted in previous review thread).
ui/litellm-dashboard/src/components/UsagePage/components/UsagePageView.tsx Subtracts total_cache_read_input_tokens and total_cache_creation_input_tokens from total_prompt_tokens with a Math.max(0, ...) floor guard to compute raw input token count.
ui/litellm-dashboard/src/components/view_logs/CostBreakdownViewer.tsx Shows split Input/Cache-Read/Cache-Write cost rows when cache_read_cost/cache_creation_cost are present; rawCost subtraction can yield negative values under floating-point imprecision (noted in previous thread).
ui/litellm-dashboard/src/components/view_logs/LogDetailsDrawer/LogDetailContent.tsx Passes cache token counts to CostBreakdownViewer and adds per-provider Metrics split, but the call_type === 'anthropic_messages' gate prevents Bedrock calls from benefiting from the same Metrics improvement.
ui/litellm-dashboard/src/components/view_logs/LogDetailsDrawer/LogDetailContent.test.tsx Adds a well-scoped test for the anthropic_messages uncached-text-tokens display path; test assertions are specific and correct.
ui/litellm-dashboard/tsconfig.json Changes jsx from 'react-jsx' to 'preserve', which is the correct setting for Next.js (SWC/Babel handles JSX transformation separately).

Sequence Diagram

sequenceDiagram
    participant P as Provider API
    participant T as transformation.py
    participant CC as cost_calculator.py
    participant L as litellm_logging.py
    participant DB as SpendLogs DB
    participant UI as CostBreakdownViewer

    P->>T: usage{input_tokens, cache_read, cache_creation}
    T->>T: capture raw_input_tokens before inflation
    T->>T: prompt_tokens += cache_read + cache_creation
    T->>T: PromptTokensDetailsWrapper.text_tokens = raw_input_tokens
    T->>CC: Usage object with cache token counts
    CC->>CC: compute cache_read_cost and cache_creation_cost
    CC->>L: store_cost_breakdown with per-type costs
    L->>DB: CostBreakdown stored in SpendLogs
    DB->>UI: additional_usage_values + cost_breakdown
    UI->>UI: render Input / Cache Read / Cache Write rows
Loading

Reviews (5): Last reviewed commit: "Merge remote-tracking branch 'origin/lit..." | Re-trigger Greptile

Comment thread litellm/types/utils.py
Comment on lines +2801 to +2803
input_cost: float # Cost of raw (non-cached) input tokens only
cache_read_cost: float # Cost of cache-read tokens (discounted rate)
cache_creation_cost: float # Cost of cache-write tokens (premium rate)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Misleading input_cost field comment — stored value is still the full prompt cost

The comment was changed to "Cost of raw (non-cached) input tokens only," but input_cost is still set to prompt_tokens_cost_usd_dollar in _store_cost_breakdown_in_logging_obj, which is the total prompt cost returned by generic_cost_per_token (raw input + cache-read + cache-creation, each at their respective rates). The UI compensates by subtracting the separate cache costs, but any external consumer of the cost_breakdown field in SpendLogs that reads this comment will compute incorrect cost figures.

Either update the backend to actually store only the raw-input portion in input_cost, or revert the comment to reflect what is actually stored (total prompt cost including cache tokens).

Suggested change
input_cost: float # Cost of raw (non-cached) input tokens only
cache_read_cost: float # Cost of cache-read tokens (discounted rate)
cache_creation_cost: float # Cost of cache-write tokens (premium rate)
input_cost: float # Cost of all prompt tokens (raw input + cache read + cache write)
cache_read_cost: float # Cost of cache-read tokens (discounted rate)
cache_creation_cost: float # Cost of cache-write tokens (premium rate)

Comment thread litellm/cost_calculator.py Outdated
Comment on lines +1614 to +1617
if _cr and _mi.get("cache_read_input_token_cost"):
_cache_read_cost = float(_cr) * float(_mi["cache_read_input_token_cost"])
if _cc and _mi.get("cache_creation_input_token_cost"):
_cache_creation_cost = float(_cc) * float(_mi["cache_creation_input_token_cost"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Tiered ephemeral cache-creation pricing not handled

generic_cost_per_token uses calculate_cache_writing_cost, which accounts for Anthropic's ephemeral tiers (ephemeral_5m_input_tokens vs ephemeral_1h_input_tokens). The new code here multiplies total cache_creation_input_tokens by a single cache_creation_input_token_cost rate, ignoring the tier split. As a result, _cache_creation_cost can differ from the portion already baked into prompt_tokens_cost_usd_dollar, causing the UI's derived rawCost = inputCost - cache_read_cost - cache_creation_cost to show a slightly off (or negative) value for requests with tiered ephemeral caching.

costBreakdown?.cache_creation_cost !== undefined;
if (hasCacheBreakdown) {
// Separate line items: Input / Cache Read / Cache Write
const rawCost = isCached ? 0 : (inputCost ?? 0) - (costBreakdown?.cache_read_cost ?? 0) - (costBreakdown?.cache_creation_cost ?? 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 rawCost can go negative with no floor guard

inputCost is the total prompt cost (raw + cache-read + cache-creation). The subtraction is correct in theory, but any floating-point imprecision between the independently-computed _cache_read_cost/_cache_creation_cost values and their portion of inputCost can produce a small negative result (e.g., -1e-15). formatCost does not handle negatives, so it would render as -$0.00000001.

Suggested change
const rawCost = isCached ? 0 : (inputCost ?? 0) - (costBreakdown?.cache_read_cost ?? 0) - (costBreakdown?.cache_creation_cost ?? 0);
const rawCost = isCached ? 0 : Math.max(0, (inputCost ?? 0) - (costBreakdown?.cache_read_cost ?? 0) - (costBreakdown?.cache_creation_cost ?? 0));

_cache_read_cost = float(_cr) * float(_mi["cache_read_input_token_cost"])
if _cc and _mi.get("cache_creation_input_token_cost"):
_cache_creation_cost = float(_cc) * float(_mi["cache_creation_input_token_cost"])
except Exception:
…elds

Boolean fields in the auto-generated guardrail provider form (e.g. Noma
`use_v2`) rendered as empty Selects because the Form.Item only populated
`initialValue` for percentage fields, and the `defaultValue` passed to the
Select child was silently dropped by antd's controlled-component wrapper.
Users could not tell what the backend default was, and the visual ambiguity
made flags like `use_v2` look inoperative even though the save path worked.

Unify `initialValue` to fall back through `fieldValue → field.default_value →
(percentage ? 0.5 : undefined)`, and switch Select.Option values from
"true"/"false" strings to real booleans so the backend default flows through
without stringification.
shivamrawat1 and others added 12 commits April 15, 2026 09:52
Bedrock GPT-OSS occasionally emits truncated toolUse.input deltas
(e.g. accumulated args of '{"":"'), which causes
test_function_calling_with_tool_response to hard-fail on json.loads.
Other overrides in TestBedrockGPTOSS already handle similar
model-side flakiness; apply retries=6 delay=5 scoped to this subclass
so other providers keep strict behavior.
GPT-OSS on Bedrock intermittently emits truncated toolUse.input deltas
(e.g. accumulated args of '{"":"'), causing
test_function_calling_with_tool_response to hard-fail on json.loads.
The model flakiness is not a litellm regression: the same base test
passes for Anthropic in the same CI run, and the streaming delta path
at invoke_handler.py has not changed recently.

Follow the existing override pattern in TestBedrockGPTOSS
(test_prompt_caching, test_completion_cost, test_tool_call_no_arguments)
and stub the test to pass. The underlying bedrock converse streaming
tool-call path is already covered by Claude/Nova/Llama Converse suites
in test_bedrock_completion.py and test_bedrock_llama.py, so removing
the live GPT-OSS check loses no unique litellm-side signal.
Complements the stubbed-out live integration test by verifying the
outgoing Bedrock Converse request body for GPT-OSS is well-formed when
the caller supplies a tool schema with OpenAI-style metadata
($id, $schema, additionalProperties, strict):
- correct converse URL for bedrock/converse/openai.gpt-oss-20b-1:0
- toolConfig.tools[0].toolSpec has the expected name/description
- inputSchema.json keeps type/properties/required and strips fields
  Bedrock does not accept
Adds a GHA that fails PRs to main unless the head branch is
'litellm_internal_staging' or 'litellm_hotfix_*'. Also fails merge_group
events since merge queue is not in use.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@yuneng-berri yuneng-berri changed the base branch from main to litellm_internal_staging April 15, 2026 17:26
@yuneng-berri yuneng-berri requested a review from a team April 15, 2026 17:26
@yuneng-berri yuneng-berri temporarily deployed to integration-postgres April 15, 2026 17:38 — with GitHub Actions Inactive
@yuneng-berri yuneng-berri temporarily deployed to integration-postgres April 15, 2026 17:38 — with GitHub Actions Inactive
@yuneng-berri yuneng-berri temporarily deployed to integration-postgres April 15, 2026 17:38 — with GitHub Actions Inactive
@ishaan-berri ishaan-berri merged commit 645e0a7 into litellm_internal_staging Apr 15, 2026
53 of 61 checks passed
@ishaan-berri ishaan-berri deleted the litellm_bedrock_cache_cost_breakdown branch April 15, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants