fix(caching): add Responses API params to cache key allow-list by michelligabriele · Pull Request #25673 · BerriAI/litellm

michelligabriele · 2026-04-14T03:49:48Z

Relevant issues

No linked GitHub issue. Reported by a customer via Slack with a full Postman reproduction; I verified the bug against current main before sending this PR.

Pre-Submission checklist

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement — tests/test_litellm/test_model_param_helper.py::test_get_all_llm_api_params_includes_responses_api. A second cache-key behavioral test is also added in tests/local_testing/test_unit_test_caching.py::test_get_cache_key_responses_api, mirroring the existing chat / embedding / text-completion cache-key tests in that file.
My PR passes all unit tests on make test-unit — targeted and adjacent suites pass locally (tests/test_litellm/test_model_param_helper.py, tests/local_testing/test_unit_test_caching.py::test_get_cache_key_*, tests/test_litellm/caching/test_caching_handler.py, ruff-clean on litellm/litellm_core_utils/model_param_helper.py).
My PR's scope is as isolated as possible, it only solves 1 specific problem — one source file touched (litellm/litellm_core_utils/model_param_helper.py), strictly additive union into the cache-key allow-list. No chat / text / embedding / transcription / rerank behavior changes.
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review — will do immediately after this PR is open.

CI (LiteLLM team)

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Reproduced against current main on a local proxy (in-memory cache, gpt-4.1, native OpenAI /v1/responses path, no DB). Two POST /v1/responses calls with identical model / input / temperature, differing only in instructions:

Call A — instructions: \"summarize the weather on 10th May\"
Call B — instructions: \"summarize the weather on 7th May\"

Before fix

Call A-10May: HTTP 200, total=0.046513s
  \"On **10th May 2018**, the weather was **chilly** with a temperature of **-2°C (29°F)**.\"
Call B-7May:  HTTP 200, total=0.027232s
  \"On **10th May 2018**, the weather was **chilly** with a temperature of **-2°C (29°F)**.\"
Diff: IDENTICAL

Both calls return the 10 May body; call B is served from cache in 27 ms. The proxy debug log shows the same pre-hash key and the same SHA-256 on both requests:

Created cache key: model: openai/gpt-4.1input: [{'role': 'user', ...}]temperature: 0.3
Hashed cache key (SHA-256): f2dc610942bd00aaecde317cf7844550d33a464d4af53b94ea6ab8f144ab8af2
Cache Hit!

instructions is absent from the pre-hash key string on both requests — that is the bug.

After fix

Call A-10May -> payload-A.json
HTTP 200  total=2.486610s  ttfb=2.486447s
  On **10th May 2018**, the weather was **chilly** with a temperature of **-2°C (29°F)**.

Call B-7May  -> payload-B.json
HTTP 200  total=1.625355s  ttfb=1.625228s
  On **7th May 2018**, the weather was **bracing** with a temperature of **14°C (57°F)**.

Diff: DIFFERENT

Both calls are now real upstream round trips (>1.6 s) and each returns the body for the correct instructions. Three independent signals of the fix: (1) call-B latency goes from 27 ms to 1.6 s, (2) call-B body flips from the 10 May content to the 7 May content, (3) the normalized-body diff verdict flips from IDENTICAL to DIFFERENT. No regression on call A.

Added test run (local):

tests/test_litellm/test_model_param_helper.py::test_get_all_llm_api_params_includes_responses_api PASSED
tests/local_testing/test_unit_test_caching.py::test_get_cache_key_responses_api PASSED
tests/local_testing/test_unit_test_caching.py::test_get_cache_key_chat_completion PASSED
tests/local_testing/test_unit_test_caching.py::test_get_cache_key_embedding PASSED
tests/local_testing/test_unit_test_caching.py::test_get_cache_key_text_completion PASSED
tests/local_testing/test_unit_test_caching.py::test_get_kwargs_for_cache_key PASSED
tests/test_litellm/test_model_param_helper.py::test_cached_relevant_logging_args_matches_dynamic PASSED
tests/test_litellm/test_model_param_helper.py::test_get_standard_logging_model_parameters_filters PASSED
tests/test_litellm/test_model_param_helper.py::test_get_standard_logging_model_parameters_excludes_prompt_content PASSED

Type

🐛 Bug Fix

Changes

The bug

Cache.get_cache_key() (litellm/caching/caching.py:294) builds the key from ModelParamHelper._get_all_llm_api_params(), which today unions the supported kwargs for chat / text / embedding / transcription / rerank — and nothing else. Native-OpenAI /v1/responses requests pass through the caching handler unchanged, so every Responses-API-only top-level kwarg is silently dropped from the key.

Under openai==2.30.0 (the hard-pinned version in pyproject.toml and requirements.txt), the Responses-only top-level kwargs that are currently being dropped are:

background, context_management, conversation, include, instructions,
max_output_tokens, max_tool_calls, parallel_tool_calls, previous_response_id,
prompt, prompt_cache_key, prompt_cache_retention, reasoning,
safety_identifier, service_tier, store, text, partial_images

model, input, temperature, top_p, stream, tools, tool_choice, user, metadata, top_logprobs, truncation, stream_options happen to survive because they collide with names in the chat / text / embedding TypedDicts — pure name coincidence. That is exactly the pattern the Pfizer customer noticed: changing input invalidates their cache, but changing instructions does not.

The user-facing effect is a silent correctness bug on /v1/responses: two requests that differ only in (e.g.) instructions collapse to the same cache entry and the second is served a stale 200. No error, no warning — just the wrong body.

The fix

Add a sixth helper, _get_litellm_supported_responses_api_kwargs(), on ModelParamHelper that sources Responses-API kwargs from openai.types.responses.response_create_params.ResponseCreateParamsNonStreaming / Streaming — a one-to-one mirror of the existing _get_litellm_supported_chat_completion_kwargs() helper — and union the returned set into _get_all_llm_api_params().

from openai.types.responses.response_create_params import (
    ResponseCreateParamsNonStreaming,
    ResponseCreateParamsStreaming,
)

# ...

@staticmethod
def _get_litellm_supported_responses_api_kwargs() -> Set[str]:
    \"\"\"
    Get the litellm supported responses API kwargs

    This follows the OpenAI API Spec
    \"\"\"
    non_streaming_params: Set[str] = set(
        getattr(ResponseCreateParamsNonStreaming, \"__annotations__\", {}).keys()
    )
    streaming_params: Set[str] = set(
        getattr(ResponseCreateParamsStreaming, \"__annotations__\", {}).keys()
    )
    return non_streaming_params.union(streaming_params)

Why this shape

Strictly additive. Chat / text / embedding / transcription / rerank cache keys are byte-identical before and after. Only /v1/responses requests see a change, and that change is the intended correctness fix. One-time cache-miss surge on first run after upgrade as previously-collapsed entries split into real per-kwarg entries; steady state recovers immediately.
Sourced from the openai SDK, not duplicated. Same convention used for chat / text / embedding / transcription; no drift between LiteLLM's allow-list and OpenAI's own spec. Future additions to ResponseCreateParamsNonStreaming / Streaming flow through automatically.
No try/except import wrapper. openai = \"2.30.0\" is hard-pinned in pyproject.toml and requirements.txt (no ^/~/>=), so there is no resolver path under which ResponseCreateParamsNonStreaming could be missing. Top-level import matches chat / text / embedding. (The transcription helper uses a try/except only because typed transcription params landed later in the openai SDK timeline — not relevant here.)
metadata is still excluded. The existing _get_exclude_kwargs() == {\"metadata\"} step still runs, so metadata behavior is unchanged for every call type including Responses. Whether metadata should be part of the Responses cache key is a separate discussion.

Files touched

litellm/litellm_core_utils/model_param_helper.py — add the import, add the helper, union the helper's output into _get_all_llm_api_params().
tests/test_litellm/test_model_param_helper.py — new test test_get_all_llm_api_params_includes_responses_api: asserts 13 Responses-API-only kwargs are present in the allow-list. Failure message names exactly what regressed.
tests/local_testing/test_unit_test_caching.py — new test test_get_cache_key_responses_api: mirrors the existing chat / embedding / text-completion cache-key tests in this file. Asserts instructions yields a distinct cache key from a baseline payload, plus a parametric loop over previous_response_id, reasoning, include, max_output_tokens, background, plus a sanity-check that identical payloads still collide (so cache hits still work).

Three files, 103 net additions, zero deletions.

vercel · 2026-04-14T03:49:55Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Apr 14, 2026 3:51am

greptile-apps · 2026-04-14T03:51:58Z

Greptile Summary

This PR fixes a silent correctness bug in Cache.get_cache_key() where Responses API-specific parameters (e.g. instructions, previous_response_id, reasoning) were absent from the cache-key allow-list, causing two requests that differed only in those fields to collapse onto the same cache entry and the second to be served a stale response.

The fix adds _get_litellm_supported_responses_api_kwargs() on ModelParamHelper, sourcing params from openai.types.responses.response_create_params, and unions it into _get_all_llm_api_params() — a one-to-one mirror of the existing chat/text/embedding/transcription/rerank pattern. The change is strictly additive; chat, text, embedding, transcription, and rerank cache keys are byte-identical before and after.

Confidence Score: 5/5

Safe to merge — strictly additive fix with clear reproduction proof, no regressions on existing call types, and solid test coverage.
All findings are P2 or lower. The fix is a minimal, well-scoped union into an existing allowlist, sourced directly from OpenAI SDK types to avoid drift. Tests are pure unit tests (no real network calls), the pattern is consistent with every other call-type helper in the file, and the one-time cache-miss side effect is intentional and documented.
No files require special attention.

Important Files Changed

Filename	Overview
litellm/litellm_core_utils/model_param_helper.py	Adds `_get_litellm_supported_responses_api_kwargs()` sourced from OpenAI SDK types and unions it into `_get_all_llm_api_params()`; strictly additive, consistent with existing chat/text/embedding/transcription/rerank pattern.
tests/local_testing/test_unit_test_caching.py	Adds `test_get_cache_key_responses_api` covering the `instructions` regression plus parametric spot-checks for 5 other Responses-only params; no real network calls made — uses only `Cache()` and `cache.get_cache_key()`.
tests/test_litellm/test_model_param_helper.py	Adds `test_get_all_llm_api_params_includes_responses_api` asserting 13 Responses-API-only kwargs are present in the allow-list with a clear failure message; serves as a regression guard for future SDK changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Cache.get_cache_key(kwargs)"] --> B["ModelParamHelper._get_all_llm_api_params()"]
    B --> C["_get_litellm_supported_chat_completion_kwargs()"]
    B --> D["_get_litellm_supported_text_completion_kwargs()"]
    B --> E["_get_litellm_supported_embedding_kwargs()"]
    B --> F["_get_litellm_supported_transcription_kwargs()"]
    B --> G["_get_litellm_supported_rerank_kwargs()"]
    B --> H["_get_litellm_supported_responses_api_kwargs() ✨ NEW"]
    H --> I["ResponseCreateParamsNonStreaming.__annotations__"]
    H --> J["ResponseCreateParamsStreaming.__annotations__"]
    C & D & E & F & G & H --> K["union of all param sets"]
    K --> L["minus _get_exclude_kwargs() = {'metadata'}"]
    L --> M["allowed_params set"]
    M --> N["Filter kwargs → build key string"]
    N --> O["SHA-256 hash → cache key"]

_{Reviews (1): Last reviewed commit: "fix(caching): add Responses API params t..." | Re-trigger Greptile}

codspeed-hq · 2026-04-14T03:52:14Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing michelligabriele:fix/responses-api-cache-key (8549774) with main (e64d98f)}

codecov · 2026-04-14T03:52:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

fix(caching): add Responses API params to cache key allow-list

8549774

vercel Bot deployed to Preview April 14, 2026 03:51 View deployment

ishaan-berri changed the base branch from main to litellm_ishaan_april14 April 14, 2026 16:52

ishaan-berri merged commit f6058bd into BerriAI:litellm_ishaan_april14 Apr 14, 2026
48 of 51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(caching): add Responses API params to cache key allow-list#25673

fix(caching): add Responses API params to cache key allow-list#25673
ishaan-berri merged 1 commit intoBerriAI:litellm_ishaan_april14from
michelligabriele:fix/responses-api-cache-key

michelligabriele commented Apr 14, 2026

Uh oh!

vercel Bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 14, 2026

Important Files Changed

Uh oh!

codspeed-hq Bot commented Apr 14, 2026

Uh oh!

codecov Bot commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

michelligabriele commented Apr 14, 2026

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Screenshots / Proof of Fix

Before fix

After fix

Type

Changes

The bug

The fix

Why this shape

Files touched

Uh oh!

vercel Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 14, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

codspeed-hq Bot commented Apr 14, 2026

Merging this PR will not alter performance

Uh oh!

codecov Bot commented Apr 14, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Apr 14, 2026 •

edited

Loading