Bug: watchdog non-streaming fallback is unreachable dead code (v2.1.84/v2.1.85)

## Bug: watchdog non-streaming fallback appears to be unreachable dead code (v2.1.84/v2.1.85)

### Environment
- Claude Code: v2.1.85 (also verified in v2.1.84)
- Analysis method: reverse-engineering minified `cli.js` via `npm pack`

### Disclaimer

**This analysis is based on reverse-engineering 12 MB of minified JavaScript.** Variable names are obfuscated, control flow is compressed into single lines of 10,000–25,000 characters, and scoping has to be traced by counting brace depth at character offsets. We've done our best to reconstruct the logic accurately, but without access to the original source code, there may be nuances we're missing. If any part of this analysis is incorrect, we'd welcome corrections — ideally with a pointer to the relevant source.

### Summary

The streaming idle watchdog (`CLAUDE_ENABLE_STREAM_WATCHDOG=1`) aborts hanging streams but **appears to fail to trigger the non-streaming fallback**. The fallback code exists and has telemetry (`fallback_cause: "watchdog"`) — but based on our tracing, it's unreachable due to an early `throw` in the error handling chain.

Users see a generic "Request timed out" error instead of a transparent retry via the non-streaming path.

### Root cause

In the inner `catch` block of the streaming event loop (v2.1.85, line 7682, char offset ~8979):

```javascript
catch(O6) {
    clearTimers();
    if (watchdogFired) { /* log telemetry */ }

    if (O6 instanceof AbortError)           // ← watchdog calls AbortController.abort()
        if (signal.aborted)                 //    which creates an AbortError
            throw O6;                       // user ESC → cancel (correct)
        else
            throw new TimeoutError("Request timed out");  // ← THROWN TO OUTER CATCH!

    // ⚠️ UNREACHABLE for watchdog abort — both paths above throw

    if (DISABLE_NONSTREAMING_FALLBACK) throw ...;

    // Non-streaming fallback (DEAD CODE for watchdog):
    log("falling back to non-streaming mode");
    fallbackFlag = true;
    telemetry("tengu_streaming_fallback_to_non_streaming", {
        fallback_cause: watchdogFired ? "watchdog" : "other"  // ← never reached
    });
    yield* nonStreamingRequest(...);  // ← never called
}
```

The outer catch doesn't know about the watchdog — it treats `TimeoutError` as a generic API failure, yields an error message to the UI, and returns.

### Expected behavior

Watchdog fires → abort stream → detect watchdog (not user ESC) → fall through to non-streaming fallback → user gets response transparently.

### Actual behavior

Watchdog fires → abort stream → `throw TimeoutError` → outer catch → "Request timed out" → done. No retry. No fallback.

### Suggested fix

Don't throw on watchdog-triggered AbortError — let it fall through to the existing fallback code:

```javascript
catch(O6) {
    clearTimers();
    if (O6 instanceof AbortError) {
        if (signal.aborted) throw O6;         // user ESC → cancel
        if (!watchdogFired) {                  // unknown SDK abort
            throw new TimeoutError("Request timed out");
        }
        // watchdog abort → fall through to non-streaming fallback below
    }

    if (DISABLE_NONSTREAMING_FALLBACK) throw ...;
    // ... existing fallback code works as intended ...
}
```

### Evidence: the fallback code was intentionally written for watchdog

The telemetry in the unreachable fallback path explicitly checks the watchdog flag:
```javascript
fallback_cause: watchdogFired ? "watchdog" : "other"
```

The fallback was clearly intended for watchdog scenarios, but the `AbortError` instanceof check above it was likely added (or refactored) later without considering this interaction. This is the kind of subtle control-flow regression that's easy to miss in a 12 MB single-file codebase — especially when the code is being generated or refactored at scale.

### Impact

- The watchdog feature (added ~v2.1.50, configurable since v2.1.84) is **fundamentally broken**: it aborts hanging streams but doesn't recover
- Users who enable `CLAUDE_ENABLE_STREAM_WATCHDOG=1` get "Request timed out" errors instead of the intended transparent retry
- This may be the reason the watchdog is disabled by default — it appears non-functional in testing because the fallback doesn't work, but the root cause is this unreachable code path, not a design problem with the watchdog itself

### Request for source access

We've been reverse-engineering `cli.js` across 11 versions (v2.1.74–v2.1.85) by grepping through 12 MB of minified code and counting brace depth to trace scoping. We've found multiple issues this way — the streaming hang root cause (#33949), JSONL writer race conditions (#31328), and now this fallback bug — but the process is extremely slow. Tracing a single code path (like the one in this issue) takes hours of `node -e` scripts and manual character-offset arithmetic.

With access to the original source code, we could verify findings like this in minutes instead of hours, and catch bugs we're currently missing because minification obscures the control flow. Given the complexity of issues the community is hitting (#6836: 150+ orphaned tool reports, #26224: agent hangs, #30137/#32870: system deadlocks), having even one community researcher with source access would meaningfully accelerate debugging.

**Our track record:**
- [github.com/kolkov](https://github.com/kolkov) — open source maintainer, 35+ public repos
- [dev.to/kolkov](https://dev.to/kolkov) — technical articles on developer tooling
- 11 versions of `cli.js` reverse-engineered with documented methodology
- Root cause analysis for streaming hangs (#33949, 👍12, 21 comments)
- Bun runtime crash analysis (#35171, #36132)

We're happy to work under NDA, read-only access, or whatever arrangement makes sense. The goal is the same — making Claude Code more reliable for everyone.

### Why open-sourcing Claude Code makes business sense in 2026

Keeping `cli.js` closed-source may have made sense in early 2025 when Claude Code launched and had first-mover advantage. But in 2026, with Cursor, Codex, Windsurf, Aider, and dozens of open-source alternatives — **the secrecy provides no competitive advantage while actively harming product quality**.

Consider the facts:
- **Anthropic's revenue comes from model API access, not from selling Claude Code as software.** The CLI is a funnel to the API — the more reliable it is, the more tokens users consume.
- **The "secret" is already out.** The entire architecture is recoverable from the minified source — we've mapped the streaming pipeline, error classes, retry logic, telemetry events, and env vars across 11 versions. Anyone with `npm pack` and a weekend can do the same. It's security through obscurity, and it's not working.
- **Bugs like this one sit undiscovered for months** because the community can't effectively review 12 MB of minified code. This specific dead-code bug means the watchdog feature (5+ months in the codebase) has **never worked as intended**. With readable source, someone would have caught this in a PR review.
- **The community is already doing the work.** #33949 has root cause analysis from reverse engineering. #31328 identified JSONL race conditions. @yichao-mt decompiled the watchdog timer. @VRDate submitted PR #35710 for tool mutex. We're all working blind — give us the source and we'll find bugs 10x faster.
- **Open source would accelerate, not threaten.** Recreating a CLI wrapper around the Anthropic API is straightforward — the hard part (the models) stays proprietary. What open source gives you is a community that catches regressions, proposes fixes, and builds trust. The current trajectory — 150+ unresolved bug reports, zero team responses, community threatening to leave for Codex — is far more dangerous to the business than open-sourcing a CLI tool.

We're not asking for model weights or internal infrastructure. Just the TypeScript source for a CLI tool that wraps your public API. The ROI is obvious: faster bug discovery, community PRs, and users who feel invested in the product rather than frustrated by it.

CC: @bcherny @ant-kurt @fvolcic @ashwin-ant @bogini @OctavianGuzu @hackyon-anthropic @chrislloyd @ThariqS @catherinewu @whyuan-cc @dhollman @rboyce-ant @dicksontsai @wolffiex @ddworken @km-anthropic — open to discussing any of this privately or publicly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: watchdog non-streaming fallback is unreachable dead code (v2.1.84/v2.1.85) #39755

Bug: watchdog non-streaming fallback appears to be unreachable dead code (v2.1.84/v2.1.85)

Environment

Disclaimer

Summary

Root cause

Expected behavior

Actual behavior

Suggested fix

Evidence: the fallback code was intentionally written for watchdog

Impact

Request for source access

Why open-sourcing Claude Code makes business sense in 2026

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: watchdog non-streaming fallback is unreachable dead code (v2.1.84/v2.1.85) #39755

Description

Bug: watchdog non-streaming fallback appears to be unreachable dead code (v2.1.84/v2.1.85)

Environment

Disclaimer

Summary

Root cause

Expected behavior

Actual behavior

Suggested fix

Evidence: the fallback code was intentionally written for watchdog

Impact

Request for source access

Why open-sourcing Claude Code makes business sense in 2026

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions