fix(bedrock): avoid double-counting cache tokens in Anthropic Messages streaming usage by Sameerlite · Pull Request #25517 · BerriAI/litellm

Sameerlite · 2026-04-10T18:34:09Z

Problem

For Bedrock Invoke Anthropic Messages streaming (bedrock_sse_wrapper → _promote_message_stop_usage), cached requests could report roughly 2× the real prompt token total and inflated spend.

What went wrong

On message_stop, Bedrock sends uncached input_tokens (e.g. 3) plus cache breakdown on the merged message_delta (cache_creation_input_tokens, cache_read_input_tokens).
_promote_message_stop_usage was rewriting message_delta.usage.input_tokens as uncached + cache_creation + cache_read (e.g. 3 + 17 + 32651 = 32671).
Downstream, AnthropicConfig.calculate_usage treats input_tokens as uncached-only and adds cache read/write again to prompt_tokens.
Result: cache tokens were included in input_tokens and added again → double counting (e.g. prompt_tokens 65339 instead of 32671 for the same request).

Fix

Keep message_delta.usage.input_tokens as the uncached-only value from message_stop (raw_input). Still promote cache_creation_input_tokens and cache_read_input_tokens onto message_delta for clients that ignore message_stop. calculate_usage then adds cache to prompt_tokens exactly once.

Tests

Assert promoted message_delta keeps uncached input_tokens == 3 with cache fields present.
End-to-end: SSE through bedrock_sse_wrapper → passthrough logging rebuild → prompt_tokens and completion_cost for us.anthropic.claude-sonnet-4-6.
Updated test_bedrock_sse_wrapper_keeps_usage_in_message_start_and_message_delta expectations for input_tokens after promotion.

…s streaming usage Made-with: Cursor

vercel · 2026-04-10T18:34:17Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Apr 10, 2026 6:35pm

codspeed-hq · 2026-04-10T18:35:43Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing litellm_bedrock-messages-cache-prompt-double-count (f0d2d26) with main (d0e347a)}

greptile-apps · 2026-04-10T18:37:03Z

Greptile Summary

This PR fixes a double-counting bug in _promote_message_stop_usage for Bedrock Invoke Anthropic Messages streaming: the old code summed uncached + cache_creation + cache_read into input_tokens, but calculate_usage then added the cache tokens again to prompt_tokens, causing ~2× inflation. The fix keeps input_tokens as the uncached-only count from message_stop, relying on calculate_usage to add cache fields once. The logic change is minimal and correct, and the three new tests (unit, preservation, and end-to-end cost) provide solid coverage.

Confidence Score: 5/5

Safe to merge — the fix is correct, minimal, and well-tested; the only remaining finding is a style nit.

The bug fix is logically correct (remove summation that caused double-counting), the updated test reflects real corrected behavior rather than masking a regression, and three new tests give good coverage including an end-to-end cost check. The sole remaining finding is a P2 import style issue that does not affect correctness or CI.

No files require special attention beyond the minor import style cleanup in the test file.

Important Files Changed

Filename	Overview
litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py	Removes the summation of cache tokens into `input_tokens`; now passes uncached count through unchanged so `calculate_usage` adds cache once.
tests/test_litellm/llms/bedrock/messages/invoke_transformations/test_anthropic_claude3_transformation.py	Updates existing assertion to reflect fixed behavior; adds two new tests, one of which imports proxy code inside the function body (violates module-level import style).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Bedrock Stream] -->|message_delta with cache fields| B[_promote_message_stop_usage]
    B -->|buffer pending_delta| C{message_stop arrives?}
    C -->|No| D[yield pending_delta as-is]
    C -->|Yes| E[Copy cache_creation and cache_read from stop into delta_usage]
    E --> F[Set delta input_tokens to uncached count from message_stop]
    F --> G[Yield merged message_delta]
    G --> H[calculate_usage]
    H --> I[prompt_tokens equals uncached plus cache_creation plus cache_read]
    I --> J[No double-counting]

_{Reviews (1): Last reviewed commit: "fix(bedrock): avoid double-counting cach..." | Re-trigger Greptile}

…hing Adds TestBedrockInvokeCacheTokenBilling covering the Bedrock InvokeModel path: - baseline: no cache tokens, prompt_tokens equals input_tokens - cache_read: prompt_tokens inflated by design, prompt_tokens_details carries breakdown - cache_creation: same pattern for write tokens - cost_calculation_correct_with_cache_read: core billing regression test - cost_calculation_correct_with_cache_creation: write-rate billing regression test - back_to_back_requests_cost: full end-to-end scenario (cache write then read) These lock in the fix from PR #25517 - cache tokens were being double-counted in AnthropicConfig.calculate_usage causing 10-50x inflated cost on cache reads.

fix(bedrock): avoid double-counting cache tokens in Anthropic Message…

f0d2d26

…s streaming usage Made-with: Cursor

Sameerlite temporarily deployed to integration-postgres April 10, 2026 18:34 — with GitHub Actions Inactive

Sameerlite had a problem deploying to integration-postgres April 10, 2026 18:34 — with GitHub Actions Error

vercel bot deployed to Preview April 10, 2026 18:35 View deployment

greptile-apps bot reviewed Apr 10, 2026

View reviewed changes

Comment thread ...itellm/llms/bedrock/messages/invoke_transformations/test_anthropic_claude3_transformation.py

yuneng-berri self-requested a review April 10, 2026 19:43

yuneng-berri approved these changes Apr 10, 2026

View reviewed changes

yuneng-berri merged commit 576e6a0 into main Apr 10, 2026
104 of 108 checks passed

yuneng-berri deleted the litellm_bedrock-messages-cache-prompt-double-count branch April 10, 2026 19:55

ishaan-berri mentioned this pull request Apr 14, 2026

fix(bedrock/anthropic): accurate cache token cost breakdown in UI and SpendLogs #25714

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(bedrock): avoid double-counting cache tokens in Anthropic Messages streaming usage#25517

fix(bedrock): avoid double-counting cache tokens in Anthropic Messages streaming usage#25517
yuneng-berri merged 1 commit intomainfrom
litellm_bedrock-messages-cache-prompt-double-count

Sameerlite commented Apr 10, 2026

Uh oh!

vercel bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Apr 10, 2026

Uh oh!

greptile-apps bot commented Apr 10, 2026

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Sameerlite commented Apr 10, 2026

Problem

Fix

Tests

Uh oh!

vercel bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Apr 10, 2026

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Apr 10, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Apr 10, 2026 •

edited

Loading