fix(gemini): resolve image token undercounting in usage metadata#22608
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes image token undercounting in Gemini/Vertex AI usage metadata by changing token assignment from overwriting (
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/llms/gemini/image_generation/transformation.py | Changed token assignment from = to += for accumulation, added token_count key fallback and .upper() normalization. Clean and correct fix. |
| litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py | Added _get_token_count helper for dual-key lookup, updated all four token detail loops to accumulate instead of overwrite. Logic is correct and consistent. |
| tests/llm_translation/test_gemini_image_usage.py | Added regression test for image token accumulation in GoogleImageGenConfig. No network calls, uses local model cost map. Covers the core fix well. |
| tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py | Added test for VertexGeminiConfig._calculate_usage with duplicate modality entries. Tests both tokenCount and token_count keys, and verifies cache subtraction logic. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[API Response with usageMetadata] --> B{Parse token details loops}
B --> C[responseTokensDetails]
B --> D[candidatesTokensDetails]
B --> E[promptTokensDetails]
B --> F[cacheTokensDetails]
C --> G[_get_token_count: try tokenCount then token_count]
D --> G
E --> G
F --> G
G --> H[Normalize modality to UPPERCASE]
H --> I{Accumulate tokens by modality}
I --> |TEXT| J["field = (field or 0) + token_count"]
I --> |IMAGE| J
I --> |AUDIO| J
I --> |VIDEO| J
E --> K[prompt_*_tokens accumulated]
F --> L[cached_*_tokens accumulated]
K --> M[Subtract cached from prompt per modality]
L --> M
M --> N[Final Usage object]
C --> N
D --> N
Last reviewed commit: de63dd8
| def test_gemini_image_generation_accumulates_multiple_image_prompt_token_details(): | ||
| """ | ||
| Regression test: promptTokensDetails can include multiple IMAGE entries. | ||
| These must be accumulated instead of overwritten. | ||
| """ | ||
| previous_local_model_cost_map = os.environ.get("LITELLM_LOCAL_MODEL_COST_MAP") | ||
| previous_model_cost = litellm.model_cost | ||
| try: | ||
| os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] = "True" | ||
| litellm.model_cost = litellm.get_model_cost_map(url="") | ||
|
|
||
| model = "gemini/gemini-3-pro-image-preview" | ||
| config = GoogleImageGenConfig() | ||
|
|
||
| usage_metadata = { | ||
| "promptTokenCount": 200, | ||
| "candidatesTokenCount": 0, | ||
| "totalTokenCount": 200, | ||
| "promptTokensDetails": [ | ||
| {"modality": "TEXT", "tokenCount": 10}, | ||
| {"modality": "IMAGE", "tokenCount": 90}, | ||
| {"modality": "IMAGE", "tokenCount": 100}, | ||
| ], | ||
| } | ||
|
|
||
| parsed_usage = config._transform_image_usage(usage_metadata) | ||
| image_response = ImageResponse( | ||
| data=[ImageObject(b64_json="fake_image_data")], | ||
| usage=parsed_usage, | ||
| ) | ||
|
|
||
| observed_cost = litellm.completion_cost( | ||
| completion_response=image_response, | ||
| model=model, | ||
| custom_llm_provider="gemini", | ||
| ) | ||
|
|
||
| model_info = litellm.get_model_info(model=model, custom_llm_provider="gemini") | ||
| expected_image_tokens = 190 | ||
| expected_total_prompt_tokens = 200 | ||
| expected_prompt_cost = expected_total_prompt_tokens * model_info["input_cost_per_token"] | ||
|
|
||
| assert parsed_usage.input_tokens_details.image_tokens == expected_image_tokens | ||
| assert parsed_usage.input_tokens_details.text_tokens == 10 | ||
| assert observed_cost == pytest.approx(expected_prompt_cost, rel=1e-12) | ||
| finally: | ||
| if previous_local_model_cost_map is None: | ||
| os.environ.pop("LITELLM_LOCAL_MODEL_COST_MAP", None) | ||
| else: | ||
| os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] = previous_local_model_cost_map | ||
| litellm.model_cost = previous_model_cost |
There was a problem hiding this comment.
Missing test for vertex _calculate_usage accumulation
This test covers the GoogleImageGenConfig._transform_image_usage path (image generation), but the PR also changed token accumulation logic in VertexGeminiConfig._calculate_usage across four loops (responseTokensDetails, candidatesTokensDetails, promptTokensDetails, cacheTokensDetails). None of those accumulation changes are covered by a test with duplicate modality entries.
Consider adding a test in tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py that passes promptTokensDetails (or candidatesTokensDetails) with multiple entries of the same modality to verify the accumulation works there too.
Fixed an issue where image tokens were being overwritten instead of accumulated in Gemini responses. Added support for both camelCase and snake_case token count keys. Fixes BerriAI#22082.
Parse tokenCount/token_count as int-safe values to satisfy mypy and avoid None/object arithmetic. Add regression test for duplicate modality accumulation in Vertex _calculate_usage.
fc14141 to
de63dd8
Compare
b3a1759
into
BerriAI:litellm_oss_staging_03_05_2026
This PR fixes the issue where image_tokens were being overwritten instead of accumulated in the usage metadata for Gemini/Vertex AI models.
Changes:
Verified with a standalone reproduction script (190 tokens counted vs 100 previously).
Fixes #22082