Skip to content

fix(gemini): resolve image token undercounting in usage metadata#22608

Merged
4 commits merged intoBerriAI:litellm_oss_staging_03_05_2026from
gustipardo:fix/gemini-image-token-accumulation
Mar 5, 2026
Merged

fix(gemini): resolve image token undercounting in usage metadata#22608
4 commits merged intoBerriAI:litellm_oss_staging_03_05_2026from
gustipardo:fix/gemini-image-token-accumulation

Conversation

@gustipardo
Copy link
Copy Markdown
Contributor

This PR fixes the issue where image_tokens were being overwritten instead of accumulated in the usage metadata for Gemini/Vertex AI models.

Changes:

  • Implemented token accumulation (+=) instead of overwriting.
  • Added support for both 'tokenCount' and 'token_count' keys.
  • Normalized modality strings to uppercase for consistency.

Verified with a standalone reproduction script (190 tokens counted vs 100 previously).

Fixes #22082

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Error Error Mar 3, 2026 3:23pm

Request Review

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 3, 2026

CLA assistant check
All committers have signed the CLA.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR fixes image token undercounting in Gemini/Vertex AI usage metadata by changing token assignment from overwriting (=) to accumulation (+=). When the API returns multiple entries for the same modality (e.g., two IMAGE entries in promptTokensDetails), tokens are now correctly summed. The fix also adds defensive handling for both tokenCount and token_count API key formats, and normalizes modality strings to uppercase for consistency.

  • Token accumulation fix: All four token detail loops in VertexGeminiConfig._calculate_usage (responseTokensDetails, candidatesTokensDetails, promptTokensDetails, cacheTokensDetails) and the GoogleImageGenConfig._transform_image_usage method now accumulate instead of overwrite
  • Dual-key support: A _get_token_count helper tries tokenCount first, then falls back to token_count for API compatibility
  • Test coverage: Two new tests verify the accumulation behavior — one for the image generation path and one for the vertex _calculate_usage path, both using mock data only

Confidence Score: 4/5

  • This PR is safe to merge — it fixes a clear bug with a well-scoped change and includes regression tests.
  • Score of 4 reflects a focused, correct bug fix with good test coverage. The changes are straightforward accumulation fixes that don't alter control flow. Both the image generation and vertex calculation paths are covered by tests. Slight deduction because the similar code in vertex_gemini_transformation.py was not updated with the same defensive improvements (token_count fallback, .upper() normalization), though that's existing code not broken by this PR.
  • No files require special attention. The main logic file litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py has the most changes but they are consistent and well-tested.

Important Files Changed

Filename Overview
litellm/llms/gemini/image_generation/transformation.py Changed token assignment from = to += for accumulation, added token_count key fallback and .upper() normalization. Clean and correct fix.
litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py Added _get_token_count helper for dual-key lookup, updated all four token detail loops to accumulate instead of overwrite. Logic is correct and consistent.
tests/llm_translation/test_gemini_image_usage.py Added regression test for image token accumulation in GoogleImageGenConfig. No network calls, uses local model cost map. Covers the core fix well.
tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py Added test for VertexGeminiConfig._calculate_usage with duplicate modality entries. Tests both tokenCount and token_count keys, and verifies cache subtraction logic.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[API Response with usageMetadata] --> B{Parse token details loops}
    B --> C[responseTokensDetails]
    B --> D[candidatesTokensDetails]
    B --> E[promptTokensDetails]
    B --> F[cacheTokensDetails]
    
    C --> G[_get_token_count: try tokenCount then token_count]
    D --> G
    E --> G
    F --> G
    
    G --> H[Normalize modality to UPPERCASE]
    H --> I{Accumulate tokens by modality}
    I --> |TEXT| J["field = (field or 0) + token_count"]
    I --> |IMAGE| J
    I --> |AUDIO| J
    I --> |VIDEO| J
    
    E --> K[prompt_*_tokens accumulated]
    F --> L[cached_*_tokens accumulated]
    K --> M[Subtract cached from prompt per modality]
    L --> M
    M --> N[Final Usage object]
    C --> N
    D --> N
Loading

Last reviewed commit: de63dd8

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment thread excalidraw.log Outdated
Comment thread package-lock.json
Comment thread scripts/repro_gemini_image_cost.py Outdated
Comment thread litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py Outdated
Comment on lines +218 to +268
def test_gemini_image_generation_accumulates_multiple_image_prompt_token_details():
"""
Regression test: promptTokensDetails can include multiple IMAGE entries.
These must be accumulated instead of overwritten.
"""
previous_local_model_cost_map = os.environ.get("LITELLM_LOCAL_MODEL_COST_MAP")
previous_model_cost = litellm.model_cost
try:
os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] = "True"
litellm.model_cost = litellm.get_model_cost_map(url="")

model = "gemini/gemini-3-pro-image-preview"
config = GoogleImageGenConfig()

usage_metadata = {
"promptTokenCount": 200,
"candidatesTokenCount": 0,
"totalTokenCount": 200,
"promptTokensDetails": [
{"modality": "TEXT", "tokenCount": 10},
{"modality": "IMAGE", "tokenCount": 90},
{"modality": "IMAGE", "tokenCount": 100},
],
}

parsed_usage = config._transform_image_usage(usage_metadata)
image_response = ImageResponse(
data=[ImageObject(b64_json="fake_image_data")],
usage=parsed_usage,
)

observed_cost = litellm.completion_cost(
completion_response=image_response,
model=model,
custom_llm_provider="gemini",
)

model_info = litellm.get_model_info(model=model, custom_llm_provider="gemini")
expected_image_tokens = 190
expected_total_prompt_tokens = 200
expected_prompt_cost = expected_total_prompt_tokens * model_info["input_cost_per_token"]

assert parsed_usage.input_tokens_details.image_tokens == expected_image_tokens
assert parsed_usage.input_tokens_details.text_tokens == 10
assert observed_cost == pytest.approx(expected_prompt_cost, rel=1e-12)
finally:
if previous_local_model_cost_map is None:
os.environ.pop("LITELLM_LOCAL_MODEL_COST_MAP", None)
else:
os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] = previous_local_model_cost_map
litellm.model_cost = previous_model_cost
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test for vertex _calculate_usage accumulation

This test covers the GoogleImageGenConfig._transform_image_usage path (image generation), but the PR also changed token accumulation logic in VertexGeminiConfig._calculate_usage across four loops (responseTokensDetails, candidatesTokensDetails, promptTokensDetails, cacheTokensDetails). None of those accumulation changes are covered by a test with duplicate modality entries.

Consider adding a test in tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py that passes promptTokensDetails (or candidatesTokensDetails) with multiple entries of the same modality to verify the accumulation works there too.

Fixed an issue where image tokens were being overwritten instead of accumulated in Gemini responses. Added support for both camelCase and snake_case token count keys. Fixes BerriAI#22082.
Parse tokenCount/token_count as int-safe values to satisfy mypy and avoid None/object arithmetic. Add regression test for duplicate modality accumulation in Vertex _calculate_usage.
@gustipardo gustipardo force-pushed the fix/gemini-image-token-accumulation branch from fc14141 to de63dd8 Compare March 3, 2026 15:22
@ghost ghost changed the base branch from main to litellm_oss_staging_03_05_2026 March 5, 2026 02:52
@ghost ghost merged commit b3a1759 into BerriAI:litellm_oss_staging_03_05_2026 Mar 5, 2026
3 of 36 checks passed
@greptile-apps greptile-apps bot mentioned this pull request Mar 5, 2026
7 tasks
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Gemini 3 Pro Image Preview - Image tokens cost calculation

2 participants