Skip to content

fix(bedrock): prevent negative streaming costs for start-only cache usage#25846

Merged
ishaan-berri merged 2 commits intolitellm_internal_stagingfrom
litellm_bedrock_cache_start_negative_cost
Apr 18, 2026
Merged

fix(bedrock): prevent negative streaming costs for start-only cache usage#25846
ishaan-berri merged 2 commits intolitellm_internal_stagingfrom
litellm_bedrock_cache_start_negative_cost

Conversation

@Sameerlite
Copy link
Copy Markdown
Collaborator

@Sameerlite Sameerlite commented Apr 16, 2026

Summary

  • merge cache usage from message_start onto buffered message_delta when message_stop omits cache fields in Bedrock Anthropic /v1/messages streams
  • add a defensive clamp in generic cost calc so derived text_tokens never goes negative on inconsistent usage payloads
  • add regression coverage asserting start-only cache streams reconstruct consistent usage and produce positive, exact cost

Before
image

After
image
fixes LIT-2144

…usage

Bedrock /v1/messages streams can report cache tokens only on message_start while message_delta carries only uncached input tokens. Merge cache fields onto the final delta usage and clamp negative text-token remainders in cost calc to keep usage/cost consistent.

Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 16, 2026 7:40am

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 16, 2026

Greptile Summary

This PR fixes negative streaming costs on Bedrock Anthropic /v1/messages streams by (1) buffering message_delta, then merging cache usage fields from message_stop—and, when absent, from message_start—before yielding the delta, and (2) adding a defensive zero-clamp in the generic cost-calc path so derived text_tokens can never go negative. Regression tests cover both the "cache-on-start-only" and the normal "cache-on-stop" paths end-to-end.

Confidence Score: 5/5

Safe to merge; the fix is well-scoped and all new tests are mock-based with no real network calls.

All three changed files look correct. The streaming state machine properly handles both the cache-on-stop and cache-on-start-only Bedrock variants, the zero-clamp is a purely defensive guard, and the new tests provide solid regression coverage. The two open P2 concerns (missing warning log at the clamp site, overly strict cost assertion) were already raised in prior review threads and do not block merge.

No files require special attention beyond the already-discussed clamp warning log and exact cost assertion.

Important Files Changed

Filename Overview
litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py Adds _merge_message_start_cache_into_delta_usage and _promote_message_stop_usage to correctly merge cache-usage fields from message_start/message_stop onto message_delta before yielding; logic is sound for all described scenarios.
litellm/litellm_core_utils/llm_cost_calc/utils.py Adds a defensive zero-clamp on derived text_tokens after the double-counting correction; prevents negative costs from inconsistent usage payloads.
tests/test_litellm/llms/bedrock/messages/invoke_transformations/test_anthropic_claude3_transformation.py Adds mock-based regression tests covering start-only cache, stop-promoted cache, and end-to-end cost; all tests avoid real network calls as required. Exact-equality cost assertion (abs=1e-9) was flagged in a prior review thread.

Sequence Diagram

sequenceDiagram
    participant B as Bedrock Stream
    participant P as _promote_message_stop_usage
    participant H as bedrock_sse_wrapper / logging

    B->>P: message_start with cache usage fields
    Note over P: snapshot start_usage
    P->>H: message_start (pass-through)

    B->>P: message_delta with input and output tokens
    Note over P: buffer as pending_delta, do not yield yet

    B->>P: message_stop with optional cache fields
    Note over P: merge stop cache fields onto delta_usage
    Note over P: if still absent, merge from start_usage snapshot
    P->>H: message_delta enriched with cache usage
    P->>H: message_stop (pass-through)

    Note over H: cost calc sees consistent usage, text_tokens clamped to zero or above
Loading

Reviews (2): Last reviewed commit: "fix(mypy): cast msg and pending_delta in..." | Re-trigger Greptile

custom_llm_provider="bedrock",
)
assert cost > 0
assert cost == pytest.approx(0.0093951, rel=0, abs=1e-9)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Overly strict cost assertion will break on pricing updates

rel=0, abs=1e-9 is effectively exact equality — any change to the bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0 entry in model_prices_and_context_window.json will fail this assertion regardless of whether the fix under test is still correct. The regression guard really only needs to confirm the cost is positive and in the right ballpark.

Suggested change
assert cost == pytest.approx(0.0093951, rel=0, abs=1e-9)
assert cost == pytest.approx(0.0093951, rel=1e-5)

Comment on lines +687 to 690
# Clamp to zero: inconsistent streaming usage
if text_tokens < 0:
text_tokens = 0
prompt_tokens_details["text_tokens"] = text_tokens
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent clamp swallows inconsistencies without observability

Clamping to 0 is the right outcome (negative cost is worse), but when cache_hit > prompt_tokens the root cause is a signal of upstream usage-data inconsistency that is now silently discarded. A verbose_logger.warning(...) at the clamp site would make future occurrences diagnosable without changing behavior.

@ishaan-berri ishaan-berri temporarily deployed to integration-postgres April 18, 2026 17:37 — with GitHub Actions Inactive
@ishaan-berri ishaan-berri temporarily deployed to integration-postgres April 18, 2026 17:37 — with GitHub Actions Inactive
@ishaan-berri ishaan-berri temporarily deployed to integration-postgres April 18, 2026 17:37 — with GitHub Actions Inactive
@ishaan-berri ishaan-berri merged commit f476447 into litellm_internal_staging Apr 18, 2026
93 of 98 checks passed
@ishaan-berri ishaan-berri deleted the litellm_bedrock_cache_start_negative_cost branch April 18, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants