Skip to content

[BUG] Max 20x: Usage meter climbing abnormally fast since ~March 23 — 1-2% per simple message exchange #38357

@hgreene624

Description

@hgreene624

Description

Since approximately March 23, my Max 20x ($200/mo) plan usage meter is climbing at roughly 5-10x the expected rate. Workloads that previously consumed negligible budget are now burning through the 5-hour window rapidly. This is a sudden change — I've been a heavy Claude Code user for months (multi-agent teams, 50+ concurrent sessions, 100M+ token days) with no usage issues until now.

Evidence

Test case: Simple troubleshooting chat

  • Fresh session, no teams, no subagents — just back-and-forth text conversation
  • Session total per status line: 37.5M tokens, $2.23 cost
  • Per-call data from JSONL: 92.7% cache hit rate (caching is working correctly)
  • Each API call: ~237K context, 99.9% cache reads on recent calls
  • Despite normal-looking per-call metrics, the usage meter climbed from 11% → 18% in approximately 20 minutes of light conversation

Forensic audit of per-call data

Parsed the session JSONL (~/.claude/projects/.../<session>.jsonl). Last 10 calls all show:

  • cache_read_input_tokens: ~236K (99.9% of context)
  • cache_creation_input_tokens: 100-500 tokens
  • input_tokens: 1-3 tokens
  • output_tokens: 100-300 tokens
  • service_tier: "standard"

Caching is working. The token counts per call look normal. But the usage meter is incrementing far faster than these numbers should produce.

Before vs after

  • Before March 23: Could run multi-agent teams (5-6 Opus agents in parallel), 50+ sessions/day, 100M+ token days without hitting limits
  • After March 23: A single simple chat burns 2% of the 5-hour window per message exchange. Hit 100% of the window in ~2.5 hours of moderate use.

Environment

  • Claude Code v2.1.81
  • macOS (Apple Silicon, Mac Studio)
  • Model: Opus 4.6 (1M context)
  • Plan: Max 20x ($200/mo)
  • No OTEL telemetry enabled
  • No proxy or custom API endpoint

Suspicion

Something changed server-side around March 20-23 in how Max plan quota consumption is calculated. The local per-call token counts look normal and caching is functioning, but the rate limit budget is being decremented at a much higher rate than the actual token usage would justify.

This may be related to:

  • A change in how cache_read_input_tokens are weighted against the quota (previously may have been discounted, now possibly counted at full rate)
  • A change in how thinking/reasoning tokens are counted against the quota
  • A backend quota calculation regression

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:costbugSomething isn't workingduplicateThis issue or pull request already existsplatform:macosIssue specifically occurs on macOS

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions