bug: uncaught StreamStateError when drain event fires on closing stream

## Description

A race condition in `abstract-message-stream.ts` causes an **uncaught exception** when a `drain` event fires on a stream whose underlying transport is already closing. The `StreamStateError` is thrown inside an event listener callback with no surrounding try/catch, so it propagates as an uncaught exception to the Node.js process level.

## Environment

- `@libp2p/utils@7.0.13` (latest)
- `@chainsafe/libp2p-noise@17.0.0`
- `@libp2p/mplex@12.0.14`
- Lodestar v1.41.0 (Ethereum consensus client)
- Node.js (TCP transport)

## Stack Trace

```
[network] error: uncaughtException: Cannot write to a stream that is closing
StreamStateError: Cannot write to a stream that is closing
    at TCPSocketMultiaddrConnection.send (abstract-message-stream.js:112)
    at stream.send (@libp2p/prometheus-metrics/index.js:183)
    at EncryptedMessageStream.sendData (@chainsafe/libp2p-noise/utils.js:169)
    at EncryptedMessageStream.processSendQueue (abstract-message-stream.js:355)
    at EncryptedMessageStream.continueSendingOnDrain (abstract-message-stream.js:56)
```

## Root Cause Analysis

**The race:**

1. A TCP connection begins closing → the underlying stream's `writeStatus` transitions to `'closing'`
2. Before the close completes, a `drain` event fires on the underlying TCP stream (the OS/Node.js TCP socket signals buffer space is available)
3. The noise layer's `noiseOnDrain` handler ([libp2p-noise/utils.ts](https://github.com/ChainSafe/js-libp2p-noise/blob/main/src/utils.ts)) relays this as `safeDispatchEvent('drain')` on the encrypted stream
4. The `continueSendingOnDrain` listener in `abstract-message-stream.ts` (line 97-104) calls `processSendQueue()`
5. `processSendQueue()` (line 428) has guards for `writableNeedsDrain`, empty buffer, and `sendingData` — but **no guard for `writeStatus`**
6. It calls `sendData()` → noise's `sendData()` calls `this.stream.send()` on the underlying (now-closing) TCP stream
7. `send()` (line 169-171) throws `StreamStateError` because `writeStatus === 'closing'`
8. The throw propagates through the event listener call stack with no try/catch → **uncaught exception**

**Key observation:** The `continueSendingOnDrain` callback at line 97 only checks `this.writableNeedsDrain` before calling `processSendQueue()`. Neither it nor `processSendQueue()` verifies that the stream is still writable.

## Impact

- On Lodestar mainnet: **6 uncaught exceptions in 7 days** (3 distinct timestamps, occurring in bursts of 2-3 when a connection tears down)
- The error is logged as `[network] error: uncaughtException` — it reaches the Node.js `process.on('uncaughtException')` handler
- Currently non-fatal (Lodestar catches and logs it) but any `uncaughtException` is a process stability risk
- Affects any js-libp2p consumer using noise over TCP when connections tear down while the send queue is draining

## Proposed Fix

Add a `writeStatus` guard in `processSendQueue()`, which is the single chokepoint for all send paths:

```typescript
// In abstract-message-stream.ts, processSendQueue() — add after existing bail-out checks:

protected processSendQueue (): boolean {
    // bail if the underlying transport is full
    if (this.writableNeedsDrain) { /* ... existing ... */ }

    // bail if there is no data to send
    if (this.writeBuffer.byteLength === 0) { /* ... existing ... */ }

    // bail if we are already sending data
    if (this.sendingData) { /* ... existing ... */ }

    // bail if the stream is no longer writable — a drain event can
    // arrive after writeStatus transitions to 'closing'/'closed'
    if (this.writeStatus !== 'writable') {
      this.log.trace('not processing send queue as stream is %s', this.writeStatus)
      return false
    }

    // ... rest of method unchanged
}
```

This is safe because:
- It's a single guard at the beginning of the method, consistent with the existing bail-out pattern
- Data remaining in `writeBuffer` when the stream closes will be discarded as part of the normal close/abort cleanup
- The `send()` method already enforces this invariant by throwing — this just converts the throw into a graceful no-op at the right level

**Alternative (less preferred):** Add a try/catch around the `sendData()` call in `processSendQueue()` to swallow `StreamStateError` during close. This would be more defensive but masks the root cause.

## Reproduction

This is a timing-dependent race that occurs during TCP connection teardown. It's most likely to manifest under load when many connections are being established and torn down simultaneously. In our case, a mainnet Ethereum beacon node with ~200 peers.

The simplest conceptual reproduction:
1. Establish a noise-over-TCP connection with mplex
2. Send enough data to fill the write buffer (trigger `writableNeedsDrain`)
3. Close/abort the connection while the buffer is still draining
4. The OS delivers a drain event after the close has started → boom

---

*Reported from [Lodestar](https://github.com/ChainSafe/lodestar) — an Ethereum consensus client built on js-libp2p.*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: uncaught StreamStateError when drain event fires on closing stream #3415

Description

Environment

Stack Trace

Root Cause Analysis

Impact

Proposed Fix

Reproduction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: uncaught StreamStateError when drain event fires on closing stream #3415

Description

Description

Environment

Stack Trace

Root Cause Analysis

Impact

Proposed Fix

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions