Skip to content

Memory leak: CLI process grows to 44GB+ RAM with GC thrashing and unresponsive SIGTERM #24644

@dollapak

Description

@dollapak

Bug Report: Unbounded Memory Growth Leading to GC Thrashing and Unresponsive Process

Environment

Component Value
Claude Code 2.1.38
OS macOS 26.2 (Darwin 25.2.0), Apple Silicon (arm64)
RAM 128 GB
Runtime Node.js (bundled)

Summary

A resumed (-r) Claude Code CLI session grew to 44.4 GB RSS (~35% of 128 GB RAM), consumed 76–152% CPU in a GC thrashing pattern, produced no output for 30+ minutes, ignored SIGTERM, and required SIGKILL to terminate. Root cause analysis points to unbounded toolUseResult.stdout accumulation in session history with no size cap or eviction.


Root Cause Analysis

Session Data Profile

The process loaded two chained sessions via -r (resume):

Session File Size Lines Messages (user+assistant) toolUseResult Data Large Results (>50KB)
Parent (resumed) 15 MB 4,617 696 + 1,166 = 1,862 0.9 MB 3
Child (active) 52 MB 341 68 + 109 = 177 47.5 MB 4
Combined 67 MB 4,958 2,039 48.3 MB 7

The 4 Large Tool Results in Active Session

Line toolUseResult.stdout Size
210 12.4 MB
263 13.7 MB
293 9.0 MB
329 12.4 MB

These are Bash tool outputs stored verbatim in session history — likely from commands that produced large stdout (e.g., git log, find, data dumps).

Memory Amplification

Metric Value
On-disk session data ~67 MB
In-memory RSS ~44.4 GB
Amplification ratio ~670x

This amplification is far beyond normal JSON parse overhead (~2-5x). Likely causes:

  1. Full conversation reconstruction per API call — Each API round-trip may rebuild the entire message array as a new object, while prior copies remain in heap awaiting GC
  2. String duplication — V8 may create separate copies of large toolUseResult strings during JSON serialization for API payloads
  3. No context pruning — Resumed sessions load the full parent + child history with no cap, even when the parent has 1,862 messages
  4. Progress messages — 2,780 progress type messages (2,624 in parent + 156 in child) add overhead without user-facing value in memory

GC Thrashing Cascade

Evidence of GC thrashing (sampled 3x at 2-second intervals):

Sample  %CPU   %MEM   RSS (bytes)
  1     92.2   35.1   47,152,288
  2    122.6   35.9   48,222,064
  3    152.8   35.2   47,222,224

Pattern: CPU increasing monotonically while RSS oscillates → V8 GC running continuously but reclaiming minimal memory because most objects are still referenced. This creates a feedback loop:

Large heap → GC takes longer → Event loop blocked → No productive work
→ SIGTERM handler never executes → Process appears hung

Zombie Child Process

PID 27961 was in Z+ (zombie) state — a spawned Bash command that completed but was never wait()-ed. This confirms the parent's event loop was blocked (by GC) when the child exited.


Observed Behavior

Metric Value
RSS (physical memory) 44.4 GB
VSZ (virtual memory) 596 GB
CPU usage 76–152% (increasing over time)
Accumulated CPU time ~59 minutes
Process state R+RN+ (always running, never sleeping)
SIGTERM response Ignored — process did not terminate
Open file descriptors 68 (normal)
Child processes 1 zombie (Z+ defunct) — never reaped
Network TCP ESTABLISHED to API endpoint
File writes in last 30 min None — no productive work

Steps to Reproduce

  1. Start a Claude Code session on a medium-to-large codebase:
    claude --dangerously-skip-permissions
    
  2. Conduct an extended session with many tool calls (target: 500+ user messages, 1000+ assistant messages)
    • Use Bash commands that produce large stdout (>1 MB each), e.g.:
      # In the Claude session, request operations like:
      "Show me all the git history"
      "Find all Python files and show their contents"
      "Run a comprehensive analysis of the codebase"
      
    • Accumulate 4+ tool results exceeding 10 MB each
  3. Exit the session
  4. Resume with:
    claude --dangerously-skip-permissions -r
    
  5. Continue working — issue multiple additional tool calls
  6. Monitor with: ps -p <PID> -o %cpu,%mem,rss
  7. Observe: RSS grows unboundedly, CPU spikes, process becomes unresponsive

Key conditions:

  • Session history must contain large toolUseResult.stdout entries (>10 MB)
  • Session must be resumed (-r) to load full parent history
  • Combined message count should exceed ~2,000 messages
  • Total toolUseResult data should exceed ~48 MB on disk

Expected Behavior

  1. Memory usage should be bounded regardless of session length
  2. Large tool outputs should be truncated or streamed, not stored verbatim in heap
  3. Resumed sessions should apply context pruning (e.g., summarize or drop old tool results)
  4. Process should respond to SIGTERM within a reasonable timeout (e.g., 5 seconds)
  5. Child processes should be reaped even under high load

Suggested Fixes

Critical (Memory)

  1. Cap toolUseResult.stdout storage — Truncate at a reasonable limit (e.g., 100 KB) with a [truncated] marker. The 13.7 MB Bash output stored in a single line serves no purpose in context replay.
  2. Context window pruning on resume — When loading a session with >N messages or >M bytes of tool results, prune old tool outputs (keep summaries or first/last N lines).
  3. Set --max-old-space-size — Add a V8 heap cap (e.g., 4 GB) to fail fast rather than consuming all system memory.

Important (Reliability)

  1. SIGTERM with force-exit fallback:
    process.on('SIGTERM', () => {
      cleanup();
      setTimeout(() => process.exit(1), 5000);
    });
  2. Child process reaping — Ensure wait() is called on all spawned processes, possibly via a watchdog independent of the event loop.

Nice to Have

  1. Memory monitoring — Log a warning when RSS exceeds a threshold (e.g., 2 GB) and suggest starting a new session.
  2. Progress message compaction — Don't persist 2,780 progress messages in the session file; compact or discard them on save.

Metadata

Metadata

Assignees

No one assigned

    Labels

    duplicateThis issue or pull request already exists

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions