[BUG] Max 20x: Usage meter climbing abnormally fast since ~March 23 — 1-2% per simple message exchange

## Description

Since approximately March 23, my Max 20x ($200/mo) plan usage meter is climbing at roughly 5-10x the expected rate. Workloads that previously consumed negligible budget are now burning through the 5-hour window rapidly. This is a sudden change — I've been a heavy Claude Code user for months (multi-agent teams, 50+ concurrent sessions, 100M+ token days) with no usage issues until now.

## Evidence

### Test case: Simple troubleshooting chat
- Fresh session, no teams, no subagents — just back-and-forth text conversation
- Session total per status line: **37.5M tokens, $2.23 cost**
- Per-call data from JSONL: **92.7% cache hit rate** (caching is working correctly)
- Each API call: ~237K context, 99.9% cache reads on recent calls
- Despite normal-looking per-call metrics, the usage meter climbed from **11% → 18%** in approximately 20 minutes of light conversation

### Forensic audit of per-call data
Parsed the session JSONL (`~/.claude/projects/.../<session>.jsonl`). Last 10 calls all show:
- `cache_read_input_tokens`: ~236K (99.9% of context)
- `cache_creation_input_tokens`: 100-500 tokens
- `input_tokens`: 1-3 tokens
- `output_tokens`: 100-300 tokens
- `service_tier`: "standard"

**Caching is working.** The token counts per call look normal. But the usage meter is incrementing far faster than these numbers should produce.

### Before vs after
- **Before March 23**: Could run multi-agent teams (5-6 Opus agents in parallel), 50+ sessions/day, 100M+ token days without hitting limits
- **After March 23**: A single simple chat burns 2% of the 5-hour window per message exchange. Hit 100% of the window in ~2.5 hours of moderate use.

## Environment
- Claude Code v2.1.81
- macOS (Apple Silicon, Mac Studio)  
- Model: Opus 4.6 (1M context)
- Plan: Max 20x ($200/mo)
- No OTEL telemetry enabled
- No proxy or custom API endpoint

## Suspicion

Something changed server-side around March 20-23 in how Max plan quota consumption is calculated. The local per-call token counts look normal and caching is functioning, but the rate limit budget is being decremented at a much higher rate than the actual token usage would justify.

This may be related to:
- A change in how `cache_read_input_tokens` are weighted against the quota (previously may have been discounted, now possibly counted at full rate)
- A change in how thinking/reasoning tokens are counted against the quota
- A backend quota calculation regression

## Related Issues
- #38350, #38345, #38335, #38330, #38064, #37856, #37622 — all reporting similar symptoms starting around the same timeframe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Max 20x: Usage meter climbing abnormally fast since ~March 23 — 1-2% per simple message exchange #38357

Description

Evidence

Test case: Simple troubleshooting chat

Forensic audit of per-call data

Before vs after

Environment

Suspicion

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Max 20x: Usage meter climbing abnormally fast since ~March 23 — 1-2% per simple message exchange #38357

Description

Description

Evidence

Test case: Simple troubleshooting chat

Forensic audit of per-call data

Before vs after

Environment

Suspicion

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions