-
Notifications
You must be signed in to change notification settings - Fork 17.8k
Repeated Bun v1.3.5 segfaults -- 78 crashes, root cause identified (Windows + WSL) #21875
Description
Repeated Bun Segfaults -- Comprehensive Report (Windows + WSL)
78 crashes documented | Jan 30 -- Feb 5, 2026 | Zero user-side mitigations effective
Claude Code's embedded Bun v1.3.5 segfaults repeatedly under normal usage on Windows x64. Crashes occur during active work, idle sessions, typing, right-clicking, and even on startup. No user-side configuration change prevents them.
This issue has been consolidated from 85 individual crash reports into a single structured document. Dump file analysis (ProcDump/WinDbg stack traces and variant classifications) will follow as a separate comment.
No further crash reports will be posted to this issue. We have documented 78 crashes across 7 days with full diagnostic data, identified root causes mapping to known Bun bugs, tested and eliminated every user-side mitigation, and provided 8 full memory dumps (7.5GB+). The evidence is comprehensive. The fix -- updating the embedded Bun version -- is on Anthropic's side.
Environment
| Component | Value |
|---|---|
| OS | Windows 11 x64 (Build 26200) |
| CPU | Intel i9-14900HX (32 logical, sse42/avx/avx2) |
| RAM | 192GB |
| Claude Code | v2.1.x (multiple versions across reporting period, currently v2.1.31) |
| Embedded Bun | v1.3.5 (1e86cebd) -- 3 versions behind latest 1.3.8 |
| Node.js | v22.22.0 (clean install) |
| WSL (cross-platform test) | WSL2, kernel v6.6.87, glibc 2.39 |
Installation methods tested (crashes persisted across all four):
- Windows native install
- Windows npm install (
npm install -g @anthropic-ai/claude-code) - WSL native install
- WSL npm install
The embedded Bun v1.3.5 was the constant across every configuration. Currently running Windows native install.
Crash Families
Three distinct crash families identified through signature analysis:
Family 1: Worker Thread Cleanup (Use-After-Free)
- Thread: Worker (not main)
- Timing: Early in session (seconds to minutes)
- Stack signature:
KERNEL32.DLL+ntdll.dll - Addresses:
0x108,0x150,0x12300000007,0x20600000010 - Spawn counts: Low (41-130)
- Maps to: oven-sh/bun#18198 -- worker thread cleanup use-after-free (known since March 2025)
Family 2: Main Thread / JSC GC (Use-After-Free)
- Thread: Main
- Timing: Variable (seconds to 2+ hours)
- Mechanism: JSC garbage collector use-after-free during
MarkedBlocksweep afterabort_signalfreesWatchpointSets - Addresses:
0x0,0x6,0x7,0xA,0xFFFFFFFFFFFFFFFE,0xFFFFFFFFFFFFFFFF, and other near-null values - Correlation: Higher
abort_signalcounts and page faults track with higher spawn volumes - Maps to: JSC GC use-after-free documented across 6+ issues in Bun's tracker. Other Bun-based AI tools (opencode) trigger identical crashes.
Family 3: Heap Address Crashes
- Thread: Main
- Addresses: Non-null, non-near-null heap addresses (
0x28727DC0024,0x292288EB1E0,0x94E3800000,0xD731E00000,0x7EDD96561004, etc.) - Behavior: Pointer dereference into heap memory that's been freed or corrupted
- Overlap: Likely the same underlying GC corruption as Family 2, manifesting at different points in the object graph
Root Cause Analysis
Primary: N-API Async Cleanup Race Condition
Identified via ProcDump/WinDbg memory dump analysis:
- Main thread executes
napi_remove_async_cleanup_hook(destroying async resources) - 5+ worker threads simultaneously inside
napi_unref_threadsafe_function(using same resources) - A worker thread runs a second
uv_runevent loop insidenapi_unref_threadsafe_function - Cleanup frees/corrupts function pointer while workers still hold references
- TTY input processing follows corrupted pointer ->
ACCESS_VIOLATION
Secondary: NaN-Boxing Value Corruption
All 5 analyzed memory dumps share one pattern: JSC engine using NaN-boxed JavaScript values as raw memory pointers.
0xFFFE...prefix encodes type information in high bits (NaN-boxing)- When GC corrupts the tagged value, the engine dereferences type metadata as a memory address
- Manifests in both native runtime code and JIT-generated code
Contributing: TTY/SIMD Intersection
Both stack trace paths share uv_tty_get_vterm_state and simdutf::get_active_implementation. The interaction between terminal I/O handling and SIMD string processing creates an additional corruption vector. This explains why crashes can be triggered by typing or right-clicking.
Key Patterns Discovered
Idle Crashes
Terminals crash while completely idle -- no user activity needed.
- 133-minute idle session (Crash Add bash/emacs keybindings to move around options #33, 3,783 spawns)
- 102-minute idle session (Crash I cannot start claude #15, 437 spawns)
- 16-minute idle with lowest-activity profile (Crash How is .env file handled? #36, 275 spawns)
Cascade Degradation
Repeated crashes cause accelerating time-to-crash on restart:
- Crash Transparency in Context Window Construction for AI Assistants #21: 9.7 min -> Model Selection Flexibility for Cost-Effective AI Interactions #22: 37s -> public release has happened :-D #23: 1.9 min -> Rename DEBUG env option to CLAUDE_DEBUG #24: 52s -> Error on first run on Windows #25: 32s -> Incorrect PowerShell Build Command for .NET Project #26: 12s
- Peak RSS stays above 1.4GB across restarts despite minimal work
- Conversation transcript accumulation appears to compound GC pressure
Cross-Platform Confirmation
Crashes persist identically on WSL2/Linux (same Bun 1.3.5). This rules out Windows-specific causes. RSS reporting was corrupted on WSL (reported 0.02ZB).
Spawn Amplification
Our workflow uses multi-agent orchestration -- up to 6,165 JSC spawns per session (typical Claude Code usage: 10-50). High spawn volume widens the race condition window by orders of magnitude, but crashes also occur at low spawn counts (25 spawns in 5 seconds for Crash #52).
UI Event Triggers
- Right-click to copy (Crash OAuth error: fetch failed #40)
- Typing input (Crashes /cost command results do not render well in dark mode terminals #41, Auto-compact to avoid "input length and max_tokens exceed context limit" errors #42, open source? #59, Fix Docker build issues on Windows and non-ARM architectures #62)
- Explained by the TTY handling path identified in dump analysis
Silent Crashes (FAST_FAIL)
Bun detects corruption internally and calls __fastfail(FATAL_APP_EXIT). No "oh no: Bun has crashed" message, no bun.report URL. Process just disappears. Unknown number of additional silent crashes may have gone unreported.
ProcDump Bypass
Bun's internal crash handler intercepts segfaults before Windows' AeDebug mechanism fires. External crash dump tools cannot capture Bun crashes via normal means. Workaround: ProcDump live-attach (first-chance exception monitor).
Thread Count Escalation
Across analyzed dumps, thread counts escalated: 22 -> 26 -> 36 -> 37 threads, with napi_unref_threadsafe_function instances growing from 5 to 10.
Unique Crash Addresses (~35 distinct)
| Address | Category | Notes |
|---|---|---|
0x0 |
Null deref | Most common on WSL |
0x6 |
Near-null | Corrupted vtable |
0x7 |
Near-null | Most common on Windows |
0xA |
Near-null | Both platforms |
0x10 |
Near-null | |
0x20 |
Near-null | Worker thread |
0x48 |
Low offset | |
0x8C |
Low offset | |
0x108 |
Family 1 | Worker, KERNEL32+ntdll |
0x150 |
Family 1 | Worker, KERNEL32+ntdll |
0x1E2 |
Low | |
0x1E3B5 |
Low | Null + struct offset |
0xCEEE |
Low | Null + struct offset |
0x24018 |
Struct | Use-after-free struct |
0x3FFFFF01 |
Mid-range | |
0xFFFFFFFFFFFFFFFE |
Sentinel | Second most common overall |
0xFFFFFFFFFFFFFFFF |
Sentinel | Third most common overall |
0x650063006D |
Unknown | Original Crash #1 |
0x12300000007 |
Family 1 | Worker |
0x20600000010 |
Family 1 | Worker |
0x26C19AF3800 |
Heap | Startup crash |
0x28727DC0024 |
Heap | |
0x292288EB1E0 |
Heap | |
0x94E3800000 |
Heap | |
0x103010000 |
Heap | Startup, 181ms |
0xD731E00000 |
Heap | |
0x1AB0BCF9BB0 |
Heap | ProcDump bypassed |
0x1DE0F6B53B1 |
Heap | |
0x7EDD96561004 |
Heap | WSL |
0x7DE1D7400000 |
Heap | WSL |
0x7830E686D640 |
Heap | WSL |
0x708DA96CCA64 |
Heap | WSL |
0x000001FF00000003 |
Variant E | Partial pointer corruption |
0xFFFE0000000003D2 |
JIT | NaN-boxed integer 978 |
Mitigation Attempts -- All Failed
Every user-side mitigation was tested systematically. None prevented crashes.
| Mitigation | Result |
|---|---|
| Hook consolidation (9 -> 4 entries, 7 -> 2 spawns/op) | Crash still occurred within 45 min |
| Further hook stripping (4 -> 3, zero spawns on read ops) | Crash #52: 25 spawns in 5 seconds |
| npm install (bypass embedded Bun, use Node.js) | Different stability issues (crashes to command prompt) |
| Single instance, single terminal | Crash #55: crashed anyway |
| Chrome MCP disabled | Crash #56: crashed anyway |
| WSL migration | Same crashes, cross-platform confirmed |
| Clean Node.js install (v22.22.0) | No effect |
| Hook path format (backslash -> forward slash) | No effect |
Three-Factor Resolution -- Posted Then Immediately Retracted
A resolution was claimed combining: (1) reduced subprocess spawns, (2) hook path format fix, (3) clean Node.js. Evidence: 5+ crash-free sessions including a 124-second agent run.
Retracted within hours. Three crashes occurred immediately after posting. Quote: "Hook migration had zero effect on crash prevention. Three-factor resolution is fully retracted."
Theories Systematically Eliminated
| Theory | Evidence Against |
|---|---|
| Multi-instance resource contention | Crash #55: single instance, single terminal |
| VS Code terminal infrastructure | Same crashes in PowerShell and cmd |
| Hook/subprocess spawn volume | Crash #52: 25 spawns in 5 seconds |
| Memory accumulation | Crashes at 0.73-0.99GB RSS (192GB system) |
| Chrome MCP overhead | Crash #56: Chrome MCP disabled |
| Windows-only | WSL crashes identical |
Conclusion: Bun 1.3.5 on x64 segfaults under normal single-instance usage. No user-side configuration change mitigates it.
Investigation Timeline
| Date | Phase | Crashes | Key Events |
|---|---|---|---|
| Jan 30 | Discovery | ~13 | Initial 3 crashes, then 10 more. Crash families identified. |
| Jan 30-31 eve | Cascade | ~18 | 15 crashes in one evening. Idle crash discovery. Cascade degradation (9.7min -> 12s). |
| Jan 31 day | Escalation | ~18 | 18 crashes in single day. Root cause research published. npm workaround attempted. |
| Feb 1 | WSL migration | ~20 | Moved to WSL2. Crashes persisted (cross-platform confirmed). Hook consolidation applied. |
| Feb 1-2 | WSL intensive | ~10 | Heavy null-deref pattern. 0.02ZB RSS anomaly. |
| Feb 3 | Windows return | ~4 | UI event triggers discovered (right-click, typing). |
| Feb 3-4 | Resolution attempt | ~5 | Three-factor resolution posted, immediately retracted. |
| Feb 4 early | Elimination | ~15 | Hook stripping, single-instance testing. All user-side theories eliminated. |
| Feb 4 afternoon | ProcDump analysis | ~8 | Full memory dump captures. WinDbg analysis. N-API race condition and NaN-boxing corruption identified. |
| Feb 5 | Final analysis | 3 | New Variant E (partial pointer corruption). JIT crash sub-variant confirmed. |
Total: ~78 crash events across 7 days.
The Ask
-
Update the embedded Bun version. v1.3.5 is 3 versions behind. v1.3.6 fixes an integer overflow crash path. v1.3.7 includes a JSC upgrade addressing the GC issues in Family 2.
-
Acknowledge the scope. 78 crashes, 35 unique addresses, 8 full memory dumps (7.5GB), root cause identified via ProcDump/WinDbg to N-API async cleanup race conditions and NaN-boxing value corruption. Zero Anthropic responses across 86 comments over 7 days.
-
No further reports. We consider this issue comprehensively documented. Every user-side mitigation has been tested and eliminated. The fix is a Bun version bump, which only Anthropic can apply.
Closing Statement
This issue has been consolidated from 85 individual crash report comments into this single structured document. The original per-crash comments have been removed to improve readability. Dump file analysis (27 full memory dumps, F1-F6 crash family classifications, unified root cause model) has been posted as a follow-up comment below.
We will not be posting additional Bun crash reports. After 78 documented crashes with full diagnostic data across Windows native and WSL, with root causes identified and mapped to known Bun bugs, we believe the evidence is sufficient. Continued reporting would be redundant. The ball is in Anthropic's court.
Note
This consolidated report was written by Claude (Opus 4.5) on behalf of @balandari as part of our AI-assisted development workflow.