fix: correct context usage calculation to include cacheRead tokens an… by reachrazamair · Pull Request #280 · RunMaestro/Maestro

reachrazamair · 2026-02-02T23:18:27Z

Problem

The context window usage gauge had several bugs causing inaccurate readings:

Underestimation — cacheReadInputTokens was excluded from the formula. For a single Anthropic API call, the total input context is inputTokens + cacheReadInputTokens + cacheCreationInputTokens (three partitions of the same input: uncached, cache-hit, newly-cached). Excluding cache-read meant the gauge showed ~3% when reality was ~23%.
False 100% — During complex multi-tool turns, Claude Code accumulates token values across all internal API calls in a single result JSONL message. With cache-read included, these accumulated totals easily exceed the 200K context window (e.g., 2.5M cache-read from 40 internal calls each reading ~62K). The old code capped at 100%, triggering false compact warnings.
Tooltip inconsistency — MainPanel.tsx recalculated context tokens independently from raw tab usageStats, bypassing the fix in App.tsx. The hover tooltip still showed 100% even when the gauge was correct.
Group chat affected — The usage-listener for group chat participants and moderators had the same formula bug, passing through inflated percentages.

Root cause

Claude Code reports a single result JSONL event per turn with token values summed across all internal API calls. For a turn with N tool uses, each internal call reads the full conversation from cache, so:

cacheReadInputTokens ≈ context_size × N (e.g., 62K × 40 = 2.5M)
inputTokens ≈ uncached_portion × N
outputTokens ≈ per_call_output × N

These sums are useful for billing but meaningless for measuring context window fill. A single API call's total input can never exceed the context window, so total > window definitively indicates accumulated values.

Fix

Formula: calculateContextTokens now includes all three input token types: input + cacheRead + cacheCreation
Accumulated detection: estimateContextUsage returns null when total > contextWindow (impossible for a single call → must be accumulated)
App.tsx: When null, skips the update entirely — the last valid percentage is preserved. No estimates, no high-water mark, just real measurements.
MainPanel.tsx: When raw tokens exceed window, back-derives display tokens from session.contextUsage (the preserved percentage)
usage-listener.ts: Group chat participant uses conditional update (omits context fields when accumulated). Moderator emits -1 sentinel so the UI handler preserves previous values.
Removed Math.min(100, ...): No longer needed — the null return handles overflow, and valid measurements are mathematically bounded to 0-100%.

Behavior

Scenario	Before	After
Normal turn	underestimated	accurate
Multi-tool turn	100% (false positive)	Holds last valid %
Hover tooltip during tool turn	100%	Same as gauge
Compact warning	Premature	Only at real thresholds

Files changed (11)

Source (5):

src/renderer/utils/contextUsage.ts — Formula fix + accumulated detection
src/main/parsers/usage-aggregator.ts — Same formula fix (main process copy)
src/renderer/App.tsx — Simplified onUsage handler, skip update when null
src/renderer/components/MainPanel.tsx — Derive from preserved percentage when accumulated
src/main/process-listeners/usage-listener.ts — Group chat accumulated value handling

Tests (6):

contextUsage.test.ts — Updated assertions for new formula, added accumulated detection tests
usage-aggregator.test.ts — Same pattern, added accumulated detection tests
usage-listener.test.ts — Updated fallback test for 200K default window
MainPanel.test.tsx — Updated cap test to use preserved session.contextUsage
contextExtractor.test.ts — Updated token totals (450→575, 350→475)
HistoryDetailModal.test.tsx — Updated display percentage (10%→12%)

Testing

16,316 tests passing, 0 failures
TypeScript clean (all 3 configs)
ESLint: 0 errors (37 pre-existing warnings, none from this change)
Full production build passes (prompts, main, preload, renderer, web, cli)

…d handle accumulated values The context formula was excluding cacheReadInputTokens, causing the gauge to drastically underestimate usage (e.g., 3% when reality was 23%). During multi-tool turns, accumulated token totals could exceed the context window, producing false 100% readings and premature compact warnings. - Include cacheReadInputTokens in the formula (input + cacheRead + cacheCreation) - Detect accumulated values (total > window) and return null to preserve last valid % - Skip context updates during accumulated turns instead of displaying inflated values - Fix MainPanel tooltip deriving from raw tab stats instead of preserved session percentage - Handle group chat participant/moderator accumulated values with -1 sentinel

claude · 2026-02-02T23:19:59Z

Pull Request Review: Context Usage Calculation Fix

Summary

This PR fixes critical bugs in the context window usage gauge that were causing both underestimation and false 100% readings. The solution is well-designed and thoroughly tested.

Code Quality: ✅ Excellent

Strengths

Excellent Problem Analysis
- The PR description provides outstanding clarity on the root cause
- Clear explanation of why cacheReadInputTokens should be included (it's part of the total input partition, not cumulative)
- Well-documented understanding of Claude Code's multi-tool turn behavior
Clean Implementation
- The fix correctly includes all three input token types: input + cacheRead + cacheCreation
- Smart detection of accumulated values using total > contextWindow (impossible for a single API call)
- Returns null to signal callers to preserve previous valid percentage instead of guessing
- Removed unnecessary Math.min(100, ...) caps since the null return handles overflow
Consistent Patterns
- Same formula applied across renderer (contextUsage.ts) and main process (usage-aggregator.ts)
- Coordinated handling in App.tsx, MainPanel.tsx, and usage-listener.ts
- All callers properly handle the null return by preserving previous state
Comprehensive Test Coverage
- 16,316 tests passing with 0 failures
- Added specific tests for accumulated value detection
- Updated all existing tests to reflect the corrected formula
- Real-world scenario tests (e.g., first turn with system prompt cache)

Potential Issues: Minor Observations

1. Group Chat Sentinel Value (-1)

Location: usage-listener.ts:566, App.tsx:3270

The use of -1 as a sentinel value is functional but could be more type-safe:

// Current approach:
contextUsage: -1  // Magic number

Suggestion: Consider using a discriminated union or explicit undefined:

type UsageUpdate = 
  | { contextUsage: number; tokenCount: number; totalCost: number }
  | { totalCost: number }; // Omit context fields when accumulated

However, this is a minor point and the current approach works fine with clear comments.

2. MainPanel Fallback Calculation

Location: MainPanel.tsx:732-735

The fallback logic derives tokens from the preserved percentage:

const preservedPercentage = activeSession?.contextUsage ?? 0;
return Math.round((preservedPercentage / 100) * activeTabContextWindow);

This is clever but introduces a circular dependency: you're deriving tokens from a percentage that was originally derived from tokens. While this works for display purposes, consider whether this could cause any edge cases where the tooltip shows a value that doesn't quite match reality.

Minor risk: If session.contextUsage was never initialized (null/undefined), this falls back to 0%, which may be misleading during the first accumulated turn.

Suggestion: Could add a check to see if activeSession?.contextUsage is actually defined before using it as a fallback, or at least document this assumption.

Performance: ✅ Excellent

No performance concerns identified
The totalContextTokens > effectiveContextWindow check is O(1)
Skipping updates during accumulated turns actually reduces UI churn

Security: ✅ No Concerns

No security implications
All calculations are done on trusted internal data
No user input validation required

Architecture: ✅ Well-Designed

Separation of Concerns
- Core calculation logic in utility functions (calculateContextTokens, estimateContextUsage)
- State management in App.tsx
- Display logic in MainPanel.tsx
Error Handling
- Graceful fallback when context window is not provided (uses default 200K)
- Proper handling of edge cases (zero tokens, missing window)
Maintainability
- Excellent comments explaining the accumulated values scenario
- Clear function names and parameter types
- Consistent pattern across codebase

Testing: ✅ Comprehensive

Unit Tests
- Tests for normal calculations
- Tests for accumulated value detection
- Edge case tests (zero window, very large values)
- Real-world scenario tests
Integration
- All 16,316 tests passing
- TypeScript clean across all configs
- ESLint clean (37 pre-existing warnings unchanged)

Best Practices: ✅ Followed

✅ DRY principle - shared utility functions
✅ Type safety - proper TypeScript types
✅ Documentation - comprehensive comments
✅ Testing - thorough test coverage
✅ No breaking changes - backward compatible
✅ Performance - no regressions

Recommendations

Must-Have (None)

All critical issues are resolved. This PR is ready to merge.

Nice-to-Have (Optional)

Consider adding a debug log when accumulated values are detected, to help troubleshoot any future issues:

if (totalContextTokens > effectiveContextWindow) {
  console.debug(`Skipping context update: accumulated values detected (${totalContextTokens} > ${effectiveContextWindow})`);
  return null;
}

Document the behavior in MainPanel where it derives from session.contextUsage:

// When accumulated from multi-tool turns, derive tokens from session's preserved percentage.
// App.tsx skips updating session.contextUsage when accumulated, so it holds
// the last valid percentage from estimateContextUsage.
// CAVEAT: If session.contextUsage was never initialized, this falls back to 0.

Consider type-safe sentinel instead of -1 (as mentioned above), but this is purely optional.

Conclusion

Recommendation: ✅ APPROVE

This is an excellent PR that:

Correctly identifies and fixes the root cause
Implements a clean, maintainable solution
Includes comprehensive test coverage
Maintains backward compatibility
Has clear documentation

The fix properly addresses both the underestimation bug (missing cacheReadInputTokens) and the false 100% bug (accumulated values) with a mathematically sound approach. The "preserve last valid percentage" strategy is much better than trying to estimate or cap accumulated values.

Great work on the thorough problem analysis and implementation! 🎉

Minor suggestions above are optional improvements and should not block merging.

reachrazamair merged commit 5373a5f into main Feb 2, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct context usage calculation to include cacheRead tokens an…#280

fix: correct context usage calculation to include cacheRead tokens an…#280
reachrazamair merged 1 commit into
mainfrom
code-refactor

reachrazamair commented Feb 2, 2026

Uh oh!

claude Bot commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

reachrazamair commented Feb 2, 2026

Problem

Root cause

Fix

Behavior

Files changed (11)

Testing

Uh oh!

claude Bot commented Feb 2, 2026

Pull Request Review: Context Usage Calculation Fix

Summary

Code Quality: ✅ Excellent

Strengths

Potential Issues: Minor Observations

1. Group Chat Sentinel Value (-1)

2. MainPanel Fallback Calculation

Performance: ✅ Excellent

Security: ✅ No Concerns

Architecture: ✅ Well-Designed

Testing: ✅ Comprehensive

Best Practices: ✅ Followed

Recommendations

Must-Have (None)

Nice-to-Have (Optional)

Conclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant