Skip to content

Repeated Bun v1.3.5 segfaults -- 78 crashes, root cause identified (Windows + WSL) #21875

@balandari

Description

@balandari

Repeated Bun Segfaults -- Comprehensive Report (Windows + WSL)

78 crashes documented | Jan 30 -- Feb 5, 2026 | Zero user-side mitigations effective

Claude Code's embedded Bun v1.3.5 segfaults repeatedly under normal usage on Windows x64. Crashes occur during active work, idle sessions, typing, right-clicking, and even on startup. No user-side configuration change prevents them.

This issue has been consolidated from 85 individual crash reports into a single structured document. Dump file analysis (ProcDump/WinDbg stack traces and variant classifications) will follow as a separate comment.

No further crash reports will be posted to this issue. We have documented 78 crashes across 7 days with full diagnostic data, identified root causes mapping to known Bun bugs, tested and eliminated every user-side mitigation, and provided 8 full memory dumps (7.5GB+). The evidence is comprehensive. The fix -- updating the embedded Bun version -- is on Anthropic's side.


Environment

Component Value
OS Windows 11 x64 (Build 26200)
CPU Intel i9-14900HX (32 logical, sse42/avx/avx2)
RAM 192GB
Claude Code v2.1.x (multiple versions across reporting period, currently v2.1.31)
Embedded Bun v1.3.5 (1e86cebd) -- 3 versions behind latest 1.3.8
Node.js v22.22.0 (clean install)
WSL (cross-platform test) WSL2, kernel v6.6.87, glibc 2.39

Installation methods tested (crashes persisted across all four):

  1. Windows native install
  2. Windows npm install (npm install -g @anthropic-ai/claude-code)
  3. WSL native install
  4. WSL npm install

The embedded Bun v1.3.5 was the constant across every configuration. Currently running Windows native install.


Crash Families

Three distinct crash families identified through signature analysis:

Family 1: Worker Thread Cleanup (Use-After-Free)

  • Thread: Worker (not main)
  • Timing: Early in session (seconds to minutes)
  • Stack signature: KERNEL32.DLL + ntdll.dll
  • Addresses: 0x108, 0x150, 0x12300000007, 0x20600000010
  • Spawn counts: Low (41-130)
  • Maps to: oven-sh/bun#18198 -- worker thread cleanup use-after-free (known since March 2025)

Family 2: Main Thread / JSC GC (Use-After-Free)

  • Thread: Main
  • Timing: Variable (seconds to 2+ hours)
  • Mechanism: JSC garbage collector use-after-free during MarkedBlock sweep after abort_signal frees WatchpointSets
  • Addresses: 0x0, 0x6, 0x7, 0xA, 0xFFFFFFFFFFFFFFFE, 0xFFFFFFFFFFFFFFFF, and other near-null values
  • Correlation: Higher abort_signal counts and page faults track with higher spawn volumes
  • Maps to: JSC GC use-after-free documented across 6+ issues in Bun's tracker. Other Bun-based AI tools (opencode) trigger identical crashes.

Family 3: Heap Address Crashes

  • Thread: Main
  • Addresses: Non-null, non-near-null heap addresses (0x28727DC0024, 0x292288EB1E0, 0x94E3800000, 0xD731E00000, 0x7EDD96561004, etc.)
  • Behavior: Pointer dereference into heap memory that's been freed or corrupted
  • Overlap: Likely the same underlying GC corruption as Family 2, manifesting at different points in the object graph

Root Cause Analysis

Primary: N-API Async Cleanup Race Condition

Identified via ProcDump/WinDbg memory dump analysis:

  1. Main thread executes napi_remove_async_cleanup_hook (destroying async resources)
  2. 5+ worker threads simultaneously inside napi_unref_threadsafe_function (using same resources)
  3. A worker thread runs a second uv_run event loop inside napi_unref_threadsafe_function
  4. Cleanup frees/corrupts function pointer while workers still hold references
  5. TTY input processing follows corrupted pointer -> ACCESS_VIOLATION

Secondary: NaN-Boxing Value Corruption

All 5 analyzed memory dumps share one pattern: JSC engine using NaN-boxed JavaScript values as raw memory pointers.

  • 0xFFFE... prefix encodes type information in high bits (NaN-boxing)
  • When GC corrupts the tagged value, the engine dereferences type metadata as a memory address
  • Manifests in both native runtime code and JIT-generated code

Contributing: TTY/SIMD Intersection

Both stack trace paths share uv_tty_get_vterm_state and simdutf::get_active_implementation. The interaction between terminal I/O handling and SIMD string processing creates an additional corruption vector. This explains why crashes can be triggered by typing or right-clicking.


Key Patterns Discovered

Idle Crashes

Terminals crash while completely idle -- no user activity needed.

Cascade Degradation

Repeated crashes cause accelerating time-to-crash on restart:

Cross-Platform Confirmation

Crashes persist identically on WSL2/Linux (same Bun 1.3.5). This rules out Windows-specific causes. RSS reporting was corrupted on WSL (reported 0.02ZB).

Spawn Amplification

Our workflow uses multi-agent orchestration -- up to 6,165 JSC spawns per session (typical Claude Code usage: 10-50). High spawn volume widens the race condition window by orders of magnitude, but crashes also occur at low spawn counts (25 spawns in 5 seconds for Crash #52).

UI Event Triggers

Silent Crashes (FAST_FAIL)

Bun detects corruption internally and calls __fastfail(FATAL_APP_EXIT). No "oh no: Bun has crashed" message, no bun.report URL. Process just disappears. Unknown number of additional silent crashes may have gone unreported.

ProcDump Bypass

Bun's internal crash handler intercepts segfaults before Windows' AeDebug mechanism fires. External crash dump tools cannot capture Bun crashes via normal means. Workaround: ProcDump live-attach (first-chance exception monitor).

Thread Count Escalation

Across analyzed dumps, thread counts escalated: 22 -> 26 -> 36 -> 37 threads, with napi_unref_threadsafe_function instances growing from 5 to 10.


Unique Crash Addresses (~35 distinct)

Address Category Notes
0x0 Null deref Most common on WSL
0x6 Near-null Corrupted vtable
0x7 Near-null Most common on Windows
0xA Near-null Both platforms
0x10 Near-null
0x20 Near-null Worker thread
0x48 Low offset
0x8C Low offset
0x108 Family 1 Worker, KERNEL32+ntdll
0x150 Family 1 Worker, KERNEL32+ntdll
0x1E2 Low
0x1E3B5 Low Null + struct offset
0xCEEE Low Null + struct offset
0x24018 Struct Use-after-free struct
0x3FFFFF01 Mid-range
0xFFFFFFFFFFFFFFFE Sentinel Second most common overall
0xFFFFFFFFFFFFFFFF Sentinel Third most common overall
0x650063006D Unknown Original Crash #1
0x12300000007 Family 1 Worker
0x20600000010 Family 1 Worker
0x26C19AF3800 Heap Startup crash
0x28727DC0024 Heap
0x292288EB1E0 Heap
0x94E3800000 Heap
0x103010000 Heap Startup, 181ms
0xD731E00000 Heap
0x1AB0BCF9BB0 Heap ProcDump bypassed
0x1DE0F6B53B1 Heap
0x7EDD96561004 Heap WSL
0x7DE1D7400000 Heap WSL
0x7830E686D640 Heap WSL
0x708DA96CCA64 Heap WSL
0x000001FF00000003 Variant E Partial pointer corruption
0xFFFE0000000003D2 JIT NaN-boxed integer 978

Mitigation Attempts -- All Failed

Every user-side mitigation was tested systematically. None prevented crashes.

Mitigation Result
Hook consolidation (9 -> 4 entries, 7 -> 2 spawns/op) Crash still occurred within 45 min
Further hook stripping (4 -> 3, zero spawns on read ops) Crash #52: 25 spawns in 5 seconds
npm install (bypass embedded Bun, use Node.js) Different stability issues (crashes to command prompt)
Single instance, single terminal Crash #55: crashed anyway
Chrome MCP disabled Crash #56: crashed anyway
WSL migration Same crashes, cross-platform confirmed
Clean Node.js install (v22.22.0) No effect
Hook path format (backslash -> forward slash) No effect

Three-Factor Resolution -- Posted Then Immediately Retracted

A resolution was claimed combining: (1) reduced subprocess spawns, (2) hook path format fix, (3) clean Node.js. Evidence: 5+ crash-free sessions including a 124-second agent run.

Retracted within hours. Three crashes occurred immediately after posting. Quote: "Hook migration had zero effect on crash prevention. Three-factor resolution is fully retracted."

Theories Systematically Eliminated

Theory Evidence Against
Multi-instance resource contention Crash #55: single instance, single terminal
VS Code terminal infrastructure Same crashes in PowerShell and cmd
Hook/subprocess spawn volume Crash #52: 25 spawns in 5 seconds
Memory accumulation Crashes at 0.73-0.99GB RSS (192GB system)
Chrome MCP overhead Crash #56: Chrome MCP disabled
Windows-only WSL crashes identical

Conclusion: Bun 1.3.5 on x64 segfaults under normal single-instance usage. No user-side configuration change mitigates it.


Investigation Timeline

Date Phase Crashes Key Events
Jan 30 Discovery ~13 Initial 3 crashes, then 10 more. Crash families identified.
Jan 30-31 eve Cascade ~18 15 crashes in one evening. Idle crash discovery. Cascade degradation (9.7min -> 12s).
Jan 31 day Escalation ~18 18 crashes in single day. Root cause research published. npm workaround attempted.
Feb 1 WSL migration ~20 Moved to WSL2. Crashes persisted (cross-platform confirmed). Hook consolidation applied.
Feb 1-2 WSL intensive ~10 Heavy null-deref pattern. 0.02ZB RSS anomaly.
Feb 3 Windows return ~4 UI event triggers discovered (right-click, typing).
Feb 3-4 Resolution attempt ~5 Three-factor resolution posted, immediately retracted.
Feb 4 early Elimination ~15 Hook stripping, single-instance testing. All user-side theories eliminated.
Feb 4 afternoon ProcDump analysis ~8 Full memory dump captures. WinDbg analysis. N-API race condition and NaN-boxing corruption identified.
Feb 5 Final analysis 3 New Variant E (partial pointer corruption). JIT crash sub-variant confirmed.

Total: ~78 crash events across 7 days.


The Ask

  1. Update the embedded Bun version. v1.3.5 is 3 versions behind. v1.3.6 fixes an integer overflow crash path. v1.3.7 includes a JSC upgrade addressing the GC issues in Family 2.

  2. Acknowledge the scope. 78 crashes, 35 unique addresses, 8 full memory dumps (7.5GB), root cause identified via ProcDump/WinDbg to N-API async cleanup race conditions and NaN-boxing value corruption. Zero Anthropic responses across 86 comments over 7 days.

  3. No further reports. We consider this issue comprehensively documented. Every user-side mitigation has been tested and eliminated. The fix is a Bun version bump, which only Anthropic can apply.


Closing Statement

This issue has been consolidated from 85 individual crash report comments into this single structured document. The original per-crash comments have been removed to improve readability. Dump file analysis (27 full memory dumps, F1-F6 crash family classifications, unified root cause model) has been posted as a follow-up comment below.

We will not be posting additional Bun crash reports. After 78 documented crashes with full diagnostic data across Windows native and WSL, with root causes identified and mapped to known Bun bugs, we believe the evidence is sufficient. Continued reporting would be redundant. The ball is in Anthropic's court.


Note

This consolidated report was written by Claude (Opus 4.5) on behalf of @balandari as part of our AI-assisted development workflow.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions