fix(bedrock): prevent negative streaming costs for start-only cache usage#25846
Conversation
…usage Bedrock /v1/messages streams can report cache tokens only on message_start while message_delta carries only uncached input tokens. Merge cache fields onto the final delta usage and clamp negative text-token remainders in cost calc to keep usage/cost consistent. Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes negative streaming costs on Bedrock Anthropic Confidence Score: 5/5Safe to merge; the fix is well-scoped and all new tests are mock-based with no real network calls. All three changed files look correct. The streaming state machine properly handles both the cache-on-stop and cache-on-start-only Bedrock variants, the zero-clamp is a purely defensive guard, and the new tests provide solid regression coverage. The two open P2 concerns (missing warning log at the clamp site, overly strict cost assertion) were already raised in prior review threads and do not block merge. No files require special attention beyond the already-discussed clamp warning log and exact cost assertion.
|
| Filename | Overview |
|---|---|
| litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py | Adds _merge_message_start_cache_into_delta_usage and _promote_message_stop_usage to correctly merge cache-usage fields from message_start/message_stop onto message_delta before yielding; logic is sound for all described scenarios. |
| litellm/litellm_core_utils/llm_cost_calc/utils.py | Adds a defensive zero-clamp on derived text_tokens after the double-counting correction; prevents negative costs from inconsistent usage payloads. |
| tests/test_litellm/llms/bedrock/messages/invoke_transformations/test_anthropic_claude3_transformation.py | Adds mock-based regression tests covering start-only cache, stop-promoted cache, and end-to-end cost; all tests avoid real network calls as required. Exact-equality cost assertion (abs=1e-9) was flagged in a prior review thread. |
Sequence Diagram
sequenceDiagram
participant B as Bedrock Stream
participant P as _promote_message_stop_usage
participant H as bedrock_sse_wrapper / logging
B->>P: message_start with cache usage fields
Note over P: snapshot start_usage
P->>H: message_start (pass-through)
B->>P: message_delta with input and output tokens
Note over P: buffer as pending_delta, do not yield yet
B->>P: message_stop with optional cache fields
Note over P: merge stop cache fields onto delta_usage
Note over P: if still absent, merge from start_usage snapshot
P->>H: message_delta enriched with cache usage
P->>H: message_stop (pass-through)
Note over H: cost calc sees consistent usage, text_tokens clamped to zero or above
Reviews (2): Last reviewed commit: "fix(mypy): cast msg and pending_delta in..." | Re-trigger Greptile
| custom_llm_provider="bedrock", | ||
| ) | ||
| assert cost > 0 | ||
| assert cost == pytest.approx(0.0093951, rel=0, abs=1e-9) |
There was a problem hiding this comment.
Overly strict cost assertion will break on pricing updates
rel=0, abs=1e-9 is effectively exact equality — any change to the bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0 entry in model_prices_and_context_window.json will fail this assertion regardless of whether the fix under test is still correct. The regression guard really only needs to confirm the cost is positive and in the right ballpark.
| assert cost == pytest.approx(0.0093951, rel=0, abs=1e-9) | |
| assert cost == pytest.approx(0.0093951, rel=1e-5) |
| # Clamp to zero: inconsistent streaming usage | ||
| if text_tokens < 0: | ||
| text_tokens = 0 | ||
| prompt_tokens_details["text_tokens"] = text_tokens |
There was a problem hiding this comment.
Silent clamp swallows inconsistencies without observability
Clamping to 0 is the right outcome (negative cost is worse), but when cache_hit > prompt_tokens the root cause is a signal of upstream usage-data inconsistency that is now silently discarded. A verbose_logger.warning(...) at the clamp site would make future occurrences diagnosable without changing behavior.
…to resolve union-attr and assignment errors
f476447
into
litellm_internal_staging
Summary
message_startonto bufferedmessage_deltawhenmessage_stopomits cache fields in Bedrock Anthropic/v1/messagesstreamstext_tokensnever goes negative on inconsistent usage payloadsBefore

After

fixes LIT-2144