Prompt cache miss at user task boundary despite identical prefix

## What happened?

When running multiple consecutive tasks in a single session, the prompt cache drops to system-prompt level (~18k tokens) on the first request of a new user task, even though the message prefix is identical to the previous request.

Cache works well within an agentic tool-call loop (90–99% hit rate), and recovers on the subsequent user task (99.9%). Only the first request after a task boundary is affected.

**Prerequisite:** tested on #2897, which preserves reasoning blocks across turns. Without that branch, follow-up requests always break cache because reasoning blocks are stripped from history.

**Reproduction:**

1. Start a session on the #2897 branch
2. Run a task that involves several turns and tool calls (e.g., "read package.json and summarize the scripts")
3. Once the task completes, send: `Hi`
4. Once that responds, send: `Hi again`
5. Check the OpenAI logs in `~/.qwen/logs/` — the `Hi` request will show a cache drop to ~18k, while `Hi again` recovers to ~99%

Example output:

| Req | Input | Cached | Cache% | Notes |
|-----|------:|-------:|-------:|-------|
| 0 | 17,971 | 0 | 0.0% | Task 1 — cold cache |
| 1 | 18,360 | 17,965 | 97.8% | Task 1 — agentic loop |
| 2 | 18,976 | 18,354 | 96.7% | Task 1 — agentic loop |
| 3 | 20,380 | 18,970 | 93.1% | Task 1 — agentic loop |
| 4 | 20,582 | 20,374 | 99.0% | Task 1 — agentic loop |
| 5 | 22,650 | 20,576 | 90.8% | Task 1 — last request |
| 6 | 23,455 | 17,965 | **76.6%** | **"Hi" — cache breaks** |
| 7 | 23,476 | 23,449 | 99.9% | **"Hi again" — cache recovers** |

Req 6 caches only ~18k tokens (system prompt) instead of the expected ~22k (req 5's full prefix).

## What did you expect to happen?

Req 6 should cache ~22,650 tokens (the full prefix from req 5), since the message content is unchanged. Expected cache hit rate would be ~90%+ instead of the observed 76.6%.

## Investigation so far

- Per-message MD5 hash comparison of the actual OpenAI request payloads (captured at the pipeline level before sending) confirms all shared messages are **byte-for-byte identical** between req 5 and req 6
- Non-message request fields (model, tools, stream options) are also identical
- The only differences are:
  - `cache_control` annotation placement (non-semantic, used as a cache hint)
  - `metadata.promptId` (changes every request, outside message content)
- Thinking/reasoning blocks are retained in history (no stripping occurs)
- The pattern is reproducible across sessions

This appears to be a provider-side cache behavior rather than a client-side prefix mismatch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt cache miss at user task boundary despite identical prefix #2986

What happened?

What did you expect to happen?

Investigation so far

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Req	Input	Cached	Cache%	Notes
0	17,971	0	0.0%	Task 1 — cold cache
1	18,360	17,965	97.8%	Task 1 — agentic loop
2	18,976	18,354	96.7%	Task 1 — agentic loop
3	20,380	18,970	93.1%	Task 1 — agentic loop
4	20,582	20,374	99.0%	Task 1 — agentic loop
5	22,650	20,576	90.8%	Task 1 — last request
6	23,455	17,965	76.6%	"Hi" — cache breaks
7	23,476	23,449	99.9%	"Hi again" — cache recovers

Prompt cache miss at user task boundary despite identical prefix #2986

Description

What happened?

What did you expect to happen?

Investigation so far

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions