Description
Since approximately March 23, my Max 20x ($200/mo) plan usage meter is climbing at roughly 5-10x the expected rate. Workloads that previously consumed negligible budget are now burning through the 5-hour window rapidly. This is a sudden change — I've been a heavy Claude Code user for months (multi-agent teams, 50+ concurrent sessions, 100M+ token days) with no usage issues until now.
Evidence
Test case: Simple troubleshooting chat
- Fresh session, no teams, no subagents — just back-and-forth text conversation
- Session total per status line: 37.5M tokens, $2.23 cost
- Per-call data from JSONL: 92.7% cache hit rate (caching is working correctly)
- Each API call: ~237K context, 99.9% cache reads on recent calls
- Despite normal-looking per-call metrics, the usage meter climbed from 11% → 18% in approximately 20 minutes of light conversation
Forensic audit of per-call data
Parsed the session JSONL (~/.claude/projects/.../<session>.jsonl). Last 10 calls all show:
cache_read_input_tokens: ~236K (99.9% of context)
cache_creation_input_tokens: 100-500 tokens
input_tokens: 1-3 tokens
output_tokens: 100-300 tokens
service_tier: "standard"
Caching is working. The token counts per call look normal. But the usage meter is incrementing far faster than these numbers should produce.
Before vs after
- Before March 23: Could run multi-agent teams (5-6 Opus agents in parallel), 50+ sessions/day, 100M+ token days without hitting limits
- After March 23: A single simple chat burns 2% of the 5-hour window per message exchange. Hit 100% of the window in ~2.5 hours of moderate use.
Environment
- Claude Code v2.1.81
- macOS (Apple Silicon, Mac Studio)
- Model: Opus 4.6 (1M context)
- Plan: Max 20x ($200/mo)
- No OTEL telemetry enabled
- No proxy or custom API endpoint
Suspicion
Something changed server-side around March 20-23 in how Max plan quota consumption is calculated. The local per-call token counts look normal and caching is functioning, but the rate limit budget is being decremented at a much higher rate than the actual token usage would justify.
This may be related to:
- A change in how
cache_read_input_tokens are weighted against the quota (previously may have been discounted, now possibly counted at full rate)
- A change in how thinking/reasoning tokens are counted against the quota
- A backend quota calculation regression
Related Issues
Description
Since approximately March 23, my Max 20x ($200/mo) plan usage meter is climbing at roughly 5-10x the expected rate. Workloads that previously consumed negligible budget are now burning through the 5-hour window rapidly. This is a sudden change — I've been a heavy Claude Code user for months (multi-agent teams, 50+ concurrent sessions, 100M+ token days) with no usage issues until now.
Evidence
Test case: Simple troubleshooting chat
Forensic audit of per-call data
Parsed the session JSONL (
~/.claude/projects/.../<session>.jsonl). Last 10 calls all show:cache_read_input_tokens: ~236K (99.9% of context)cache_creation_input_tokens: 100-500 tokensinput_tokens: 1-3 tokensoutput_tokens: 100-300 tokensservice_tier: "standard"Caching is working. The token counts per call look normal. But the usage meter is incrementing far faster than these numbers should produce.
Before vs after
Environment
Suspicion
Something changed server-side around March 20-23 in how Max plan quota consumption is calculated. The local per-call token counts look normal and caching is functioning, but the rate limit budget is being decremented at a much higher rate than the actual token usage would justify.
This may be related to:
cache_read_input_tokensare weighted against the quota (previously may have been discounted, now possibly counted at full rate)Related Issues