Skip to content

fix: prevent infinite loop in _updateSubscriptions when WebSocket is closing#3778

Open
Kropiunig wants to merge 1 commit intosolana-foundation:maintenance/v1.xfrom
Kropiunig:fix/websocket-update-subscriptions-infinite-loop
Open

fix: prevent infinite loop in _updateSubscriptions when WebSocket is closing#3778
Kropiunig wants to merge 1 commit intosolana-foundation:maintenance/v1.xfrom
Kropiunig:fix/websocket-update-subscriptions-infinite-loop

Conversation

@Kropiunig
Copy link

Problem

Fixes #3761

When a WebSocket connection transitions to CLOSING (readyState 2) or CLOSED (readyState 3) state, there is a window between the socket becoming unusable and the close event firing. During this window:

  1. _rpcWebSocketConnected is still true (it only flips when _wsOnClose runs)
  2. _rpcWebSocketGeneration hasn't incremented yet
  3. isCurrentConnectionStillActive() still returns true

Any RPC call (call/notify) on the RpcWebSocketClient will reject immediately with:

Tried to call a JSON-RPC method `accountUnsubscribe` but the socket was not
`CONNECTING` or `OPEN` (`readyState` was 2)

But the error handler in _updateSubscriptions() sees the connection as still active, resets the subscription state, and calls _updateSubscriptions() recursively — which immediately fails again, creating a tight infinite loop that floods logs and can crash the process.

Real-world impact: This has been observed in production with Drift Protocol SDK users on Railway, where the log storm reached 500 logs/sec and caused instance crashes.

Root Cause

isCurrentConnectionStillActive() only checks the generation counter:

const isCurrentConnectionStillActive = () => {
  return activeWebSocketGeneration === this._rpcWebSocketGeneration;
};

This misses the case where the socket is dying but the close event hasn't fired yet. The readyState transitions happen synchronously on the socket, but the close event is asynchronous — there's a real gap where the generation counter is stale.

Solution

  1. Expose readyState on RpcWebSocketClient via a getter that delegates to the underlying socket (defaults to CLOSED when no socket exists)

  2. Guard isCurrentConnectionStillActive() to also verify the socket is in a usable state (CONNECTING or OPEN):

const isCurrentConnectionStillActive = () => {
  if (activeWebSocketGeneration !== this._rpcWebSocketGeneration) {
    return false;
  }
  const readyState = this._rpcWebSocket.readyState;
  return readyState === 0 /* CONNECTING */ || readyState === 1 /* OPEN */;
};

This breaks the recursive loop immediately when the socket is no longer usable, allowing the normal reconnection flow to take over when the close event eventually fires.

Changes

File Change
src/rpc-websocket.ts Added readyState getter (+3 lines)
src/connection.ts Extended isCurrentConnectionStillActive() with readyState check (+10 lines)
test/connection-subscriptions.test.ts Updated test stub to track readyState + added 2 regression tests (+120 lines)

Testing

  • All 198 subscription tests pass (previously 14 retry tests were failing because the test stub didn't expose readyState — fixed by having the stub track open/close state)
  • Added regression tests for:
  • Both tests verify that _updateSubscriptions() doesn't exceed a reasonable call count (< 5), catching any infinite loop

Notes

  • Uses numeric constants (0, 1, 3) instead of WebSocket.CONNECTING/OPEN/CLOSED to avoid Node.js/browser compatibility issues, consistent with the existing pattern in rpc-websocket.ts
  • This is a minimal, targeted fix — no behavioral changes to the happy path

…closing

isCurrentConnectionStillActive() only checked the connection generation
counter, which doesn't change until the 'close' event fires. When the
underlying WebSocket enters the CLOSING (readyState 2) or CLOSED
(readyState 3) state before that event propagates, RPC calls fail
immediately but the guard considers the connection active, causing
_updateSubscriptions() to retry indefinitely in a tight recursive loop.

This manifests in production as a storm of error logs:

  accountUnsubscribe error: Tried to call a JSON-RPC method
  `accountUnsubscribe` but the socket was not `CONNECTING` or `OPEN`
  (`readyState` was 2)

The fix adds a readyState check to isCurrentConnectionStillActive() so
that it returns false when the socket is no longer in a usable state
(CONNECTING or OPEN), breaking the retry loop.

Changes:
- Expose readyState getter on RpcWebSocketClient (delegates to the
  underlying socket, defaults to CLOSED when no socket exists)
- Guard isCurrentConnectionStillActive() against CLOSING/CLOSED state
- Update test stub to track readyState based on mock open/close state
  (fixes 14 pre-existing test failures in retry test suite)
- Add regression tests for CLOSING-during-unsubscribe and
  CLOSED-during-subscribe scenarios

Fixes solana-foundation#3761
@Kropiunig
Copy link
Author

Hi @jacobcreech — pinging for visibility. This PR fixes an infinite spin-loop when a WebSocket transitions to CLOSING/CLOSED state before the close event fires, which causes _updateSubscriptions to loop forever. CI is passing. Would appreciate a review when you have a moment, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[URGENT] Infinite loop in Connection._updateSubscriptions()

1 participant