Skip to content

feat(dashscope): preserve cache_control for explicit prompt caching#25331

Merged
krrish-berri-2 merged 1 commit intoBerriAI:litellm_oss_staging_04_08_2026from
silencedoctor:feat/dashscope-preserve-cache-control
Apr 9, 2026
Merged

feat(dashscope): preserve cache_control for explicit prompt caching#25331
krrish-berri-2 merged 1 commit intoBerriAI:litellm_oss_staging_04_08_2026from
silencedoctor:feat/dashscope-preserve-cache-control

Conversation

@silencedoctor
Copy link
Copy Markdown
Contributor

@silencedoctor silencedoctor commented Apr 8, 2026

Relevant issues

Fixes #25330

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Changes

The DashScope provider inherits OpenAIGPTConfig, which strips all cache_control fields from messages and tools by default via remove_cache_control_flag_from_messages_and_tools(). This prevents users from using explicit prompt caching with DashScope-hosted models that support it.

This PR overrides remove_cache_control_flag_from_messages_and_tools() in DashScopeChatConfig to preserve cache_control fields, following the exact same pattern already used by:

  • ZAI (litellm/llms/zai/chat/transformation.py)
  • MiniMax (litellm/llms/minimax/chat/transformation.py)
  • Databricks (litellm/llms/databricks/chat/transformation.py)

This change is safe for models that don't use cache_control — if no cache_control field is present, the behavior is identical to before.

Files changed

  • litellm/llms/dashscope/chat/transformation.py — Added override method
  • tests/test_litellm/llms/dashscope/test_dashscope_chat_transformation.py — Added 2 tests for cache_control preservation in messages and tools

Verification

Beyond the unit tests, this change was validated with live 10-round multi-turn conversation tests against the DashScope API:

Explicit caching works correctly:

  • With cache_control markers on user messages, cached_tokens grows each round as conversation history accumulates, and cache_creation_input_tokens is reported on initial cache build.
  • Cache hits begin from the first round after prompt tokens exceed the 1024-token threshold.

Implicit caching is not affected:

  • Models relying on implicit prefix-matching caching produce identical cached_tokens values with and without this change, confirmed by running the same conversation against both the reverted codebase and the patched codebase.
  • Results were further cross-validated by comparing litellm output against direct API calls (bypassing litellm entirely via raw HTTP requests to the same /compatible-mode/v1/chat/completions endpoint). The cached_tokens matched on every round.

No regressions on non-caching models:

  • Models that do not support explicit caching were tested with cache_control present in the request. The DashScope API silently ignores the unrecognized field — no errors or behavioral changes observed.

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 8, 2026 0:27am

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 8, 2026

Greptile Summary

This PR adds support for explicit prompt caching in the DashScope provider by overriding remove_cache_control_flag_from_messages_and_tools() in DashScopeChatConfig to preserve cache_control fields instead of stripping them. The fix follows the exact same pattern already established by ZAI, MiniMax, and Databricks providers, making it a minimal, targeted, and low-risk change.

Key changes:

  • Added remove_cache_control_flag_from_messages_and_tools() override in DashScopeChatConfig that returns messages and tools unchanged, preserving cache_control fields for DashScope-hosted models that support prompt caching.
  • Added 2 pure unit tests covering cache_control preservation in both messages and tools, with no real network calls (compliant with the tests/test_litellm/ mock-only policy).

Confidence Score: 5/5

Safe to merge — minimal, targeted override following an established pattern with appropriate unit test coverage.

The change is a one-method override that mirrors the exact same implementation already used by ZAI, MiniMax, and Databricks providers. It is backward-compatible (no cache_control = identical behavior to before), the tests are pure unit tests with no network calls, and no custom rules are violated. All remaining findings are P2 or already captured in prior review threads.

No files require special attention.

Vulnerabilities

No security concerns identified.

Important Files Changed

Filename Overview
litellm/llms/dashscope/chat/transformation.py Adds remove_cache_control_flag_from_messages_and_tools override to preserve cache_control fields; two imports from the same module on separate lines (already flagged in a prior review thread).
tests/test_litellm/llms/dashscope/test_dashscope_chat_transformation.py Adds two pure unit tests that directly call remove_cache_control_flag_from_messages_and_tools to verify cache_control is preserved in messages and tools; no network calls, no regressions.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Request with cache_control fields] --> B{Provider?}
    B -->|OpenAIGPTConfig default| C[remove_cache_control_flag_from_messages_and_tools]
    C --> D[Strip cache_control from messages & tools]
    B -->|DashScopeChatConfig override| E[remove_cache_control_flag_from_messages_and_tools]
    E --> F[Return messages & tools unchanged]
    D --> G[Request sent without cache_control]
    F --> H[Request sent with cache_control preserved]
Loading

Reviews (2): Last reviewed commit: "feat(dashscope): preserve cache_control ..." | Re-trigger Greptile

Comment on lines +7 to 10
from litellm.types.llms.openai import ChatCompletionToolParam

from litellm.secret_managers.main import get_secret_str
from litellm.types.llms.openai import AllMessageValues
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicate import from the same module

ChatCompletionToolParam and AllMessageValues are both imported from litellm.types.llms.openai in separate statements. Per the project's style guide, these should be merged into a single import.

Suggested change
from litellm.types.llms.openai import ChatCompletionToolParam
from litellm.secret_managers.main import get_secret_str
from litellm.types.llms.openai import AllMessageValues
from litellm.types.llms.openai import AllMessageValues, ChatCompletionToolParam
from litellm.secret_managers.main import get_secret_str

Context Used: CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 8, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing silencedoctor:feat/dashscope-preserve-cache-control (45f155f) with main (62757ff)

Open in CodSpeed

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

DashScope inherits OpenAIGPTConfig which strips cache_control from
messages and tools by default. Override remove_cache_control_flag_from_messages_and_tools()
to preserve cache_control, following the same pattern used by ZAI, MiniMax, and Databricks.

Verified through 10-round multi-turn conversation tests:
- Explicit caching works correctly: cached_tokens grows each round from R4 onwards,
  with cache_creation_tokens reported on first cache build.
- Implicit caching is not affected: models that rely on implicit prefix-matching caching
  produce identical cached_tokens with and without this change, confirmed by comparing
  results against both the reverted codebase and direct API calls bypassing litellm.
- No errors or regressions observed on any model, including those that do not support
  explicit caching — the DashScope API silently ignores unrecognized cache_control fields.

Fixes BerriAI#25330

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@silencedoctor silencedoctor force-pushed the feat/dashscope-preserve-cache-control branch from b6319f2 to 45f155f Compare April 8, 2026 12:21
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 8, 2026

Tip:

Greploops — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.

Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

@krrish-berri-2 krrish-berri-2 changed the base branch from main to litellm_oss_staging_04_08_2026 April 9, 2026 04:31
@krrish-berri-2 krrish-berri-2 merged commit 4e32479 into BerriAI:litellm_oss_staging_04_08_2026 Apr 9, 2026
49 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: DashScope provider should preserve cache_control for explicit prompt caching

2 participants