Describe the bug
Every message in a Claude Code session re-sends the full instruction set (CLAUDE.md files, system prompts, conversation history) as cached context. Cache read tokens count against the usage quota. As CLAUDE.md files grow, cache read token consumption scales linearly with both file size and message count, causing quota to deplete far faster than actual productive I/O would suggest.
Data
I parsed 30 days of Claude Code session transcripts (JSONL files) and extracted token usage from every API response.
30-day totals (Jan 9 - Feb 8, 2026):
I/O tokens (actual work): 3,887,759
Cache read tokens: 5,092,500,074
Cache creation tokens: 176,498,498
Ratio: 1,310 cache reads per 1 I/O token
Cache reads as % of total: 99.93%
Weekly breakdown showing cache reads scaling with CLAUDE.md growth, not workload:
Week of Jan 11: 276,151,498 cache reads
Week of Jan 18: 967,624,068 cache reads (3.5x increase)
Week of Jan 25: 1,192,316,036 cache reads
Week of Feb 1: 1,474,919,498 cache reads (peak)
Week of Feb 8: 1,181,488,974 cache reads (ongoing)
Single-day comparison showing non-linear scaling in longer sessions:
Feb 7: 78,312,699 cache reads | 70,533 I/O tokens
Feb 8: 218,548,562 cache reads | 118,663 I/O tokens
Cache reads increased 2.8x while I/O only increased 1.7x
Environment
- Claude Code version: 2.1.37
- Models tested: Opus 4.5, Opus 4.6 (identical patterns on both)
- OS: macOS Darwin 25.2.0
- CLAUDE.md total size: ~57KB (~15,000 tokens) across global + project files
- Typical session length: 50-150 messages
To reproduce
- Create CLAUDE.md files with detailed project instructions (any size - larger files make the effect more visible)
- Run a Claude Code session with 50+ messages
- Parse the session transcript JSONL for
cache_read_input_tokens in the usage object of each assistant message
- Compare cache read total to input + output token total
Token usage is available in each assistant message entry in the JSONL transcript at:
~/.claude/projects/<project>/<session-id>.jsonl
Each entry contains:
"usage": {
"input_tokens": ...,
"output_tokens": ...,
"cache_creation_input_tokens": ...,
"cache_read_input_tokens": ...
}
Expected behavior
Cache read tokens should either:
- Not count against usage quota (since they represent re-reading the same context the user already provided), or
- Count at a significantly reduced weight, or
- Be minimized architecturally (e.g., don't re-send unchanged CLAUDE.md content every message, use deltas, or load instruction files on-demand)
Actual behavior
Cache read tokens count fully against quota. Every message re-sends the complete instruction set regardless of whether it changed. This means:
- A 15k-token CLAUDE.md costs 15k cache reads per message
- A 100-message session costs 1.5M cache reads just from instructions
- Multiple sessions per day compound this to hundreds of millions
- Users have no control over the re-send behavior
Why this matters
This explains the widespread "$100 feels like $20" feedback. Users are not consuming more productive tokens. Their quota is being consumed by the architectural overhead of re-reading cached context on every message. As users naturally grow their CLAUDE.md files (the intended workflow for tuning Claude Code), their quota depletion accelerates even with identical workloads.
Additional context
- This is model-agnostic. Opus 4.5 and 4.6 produce identical cache patterns.
- My CLAUDE.md setup (~57KB) is larger than average. But the architecture affects all users proportionally - a 5KB CLAUDE.md has the same pattern at smaller scale.
- I have reported this separately to Anthropic support with the full dataset.
Disclaimer: I used my own AI tool to help parse the token data from session transcripts. The data is real, pulled directly from Claude Code JSONL session logs.
Describe the bug
Every message in a Claude Code session re-sends the full instruction set (CLAUDE.md files, system prompts, conversation history) as cached context. Cache read tokens count against the usage quota. As CLAUDE.md files grow, cache read token consumption scales linearly with both file size and message count, causing quota to deplete far faster than actual productive I/O would suggest.
Data
I parsed 30 days of Claude Code session transcripts (JSONL files) and extracted token usage from every API response.
30-day totals (Jan 9 - Feb 8, 2026):
Weekly breakdown showing cache reads scaling with CLAUDE.md growth, not workload:
Single-day comparison showing non-linear scaling in longer sessions:
Environment
To reproduce
cache_read_input_tokensin the usage object of each assistant messageToken usage is available in each assistant message entry in the JSONL transcript at:
~/.claude/projects/<project>/<session-id>.jsonlEach entry contains:
Expected behavior
Cache read tokens should either:
Actual behavior
Cache read tokens count fully against quota. Every message re-sends the complete instruction set regardless of whether it changed. This means:
Why this matters
This explains the widespread "$100 feels like $20" feedback. Users are not consuming more productive tokens. Their quota is being consumed by the architectural overhead of re-reading cached context on every message. As users naturally grow their CLAUDE.md files (the intended workflow for tuning Claude Code), their quota depletion accelerates even with identical workloads.
Additional context
Disclaimer: I used my own AI tool to help parse the token data from session transcripts. The data is real, pulled directly from Claude Code JSONL session logs.