Cache read tokens consume 99.93% of usage quota - architectural scaling issue with CLAUDE.md re-reads

**Describe the bug**

Every message in a Claude Code session re-sends the full instruction set (CLAUDE.md files, system prompts, conversation history) as cached context. Cache read tokens count against the usage quota. As CLAUDE.md files grow, cache read token consumption scales linearly with both file size and message count, causing quota to deplete far faster than actual productive I/O would suggest.

**Data**

I parsed 30 days of Claude Code session transcripts (JSONL files) and extracted token usage from every API response.

30-day totals (Jan 9 - Feb 8, 2026):

```
I/O tokens (actual work):        3,887,759
Cache read tokens:           5,092,500,074
Cache creation tokens:         176,498,498

Ratio: 1,310 cache reads per 1 I/O token
Cache reads as % of total:       99.93%
```

Weekly breakdown showing cache reads scaling with CLAUDE.md growth, not workload:

```
Week of Jan 11:   276,151,498 cache reads
Week of Jan 18:   967,624,068 cache reads  (3.5x increase)
Week of Jan 25: 1,192,316,036 cache reads
Week of Feb  1: 1,474,919,498 cache reads  (peak)
Week of Feb  8: 1,181,488,974 cache reads  (ongoing)
```

Single-day comparison showing non-linear scaling in longer sessions:

```
Feb 7:  78,312,699 cache reads |  70,533 I/O tokens
Feb 8: 218,548,562 cache reads | 118,663 I/O tokens

Cache reads increased 2.8x while I/O only increased 1.7x
```

**Environment**

- Claude Code version: 2.1.37
- Models tested: Opus 4.5, Opus 4.6 (identical patterns on both)
- OS: macOS Darwin 25.2.0
- CLAUDE.md total size: ~57KB (~15,000 tokens) across global + project files
- Typical session length: 50-150 messages

**To reproduce**

1. Create CLAUDE.md files with detailed project instructions (any size - larger files make the effect more visible)
2. Run a Claude Code session with 50+ messages
3. Parse the session transcript JSONL for `cache_read_input_tokens` in the usage object of each assistant message
4. Compare cache read total to input + output token total

Token usage is available in each assistant message entry in the JSONL transcript at:
`~/.claude/projects/<project>/<session-id>.jsonl`

Each entry contains:
```json
"usage": {
  "input_tokens": ...,
  "output_tokens": ...,
  "cache_creation_input_tokens": ...,
  "cache_read_input_tokens": ...
}
```

**Expected behavior**

Cache read tokens should either:
1. Not count against usage quota (since they represent re-reading the same context the user already provided), or
2. Count at a significantly reduced weight, or
3. Be minimized architecturally (e.g., don't re-send unchanged CLAUDE.md content every message, use deltas, or load instruction files on-demand)

**Actual behavior**

Cache read tokens count fully against quota. Every message re-sends the complete instruction set regardless of whether it changed. This means:

- A 15k-token CLAUDE.md costs 15k cache reads per message
- A 100-message session costs 1.5M cache reads just from instructions
- Multiple sessions per day compound this to hundreds of millions
- Users have no control over the re-send behavior

**Why this matters**

This explains the widespread "$100 feels like $20" feedback. Users are not consuming more productive tokens. Their quota is being consumed by the architectural overhead of re-reading cached context on every message. As users naturally grow their CLAUDE.md files (the intended workflow for tuning Claude Code), their quota depletion accelerates even with identical workloads.

**Additional context**

- This is model-agnostic. Opus 4.5 and 4.6 produce identical cache patterns.
- My CLAUDE.md setup (~57KB) is larger than average. But the architecture affects all users proportionally - a 5KB CLAUDE.md has the same pattern at smaller scale.
- I have reported this separately to Anthropic support with the full dataset.

Disclaimer: I used my own AI tool to help parse the token data from session transcripts. The data is real, pulled directly from Claude Code JSONL session logs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache read tokens consume 99.93% of usage quota - architectural scaling issue with CLAUDE.md re-reads #24147

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cache read tokens consume 99.93% of usage quota - architectural scaling issue with CLAUDE.md re-reads #24147

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions