Skip to content

Race condition in getSignatureStatuses causes inconsistent context slot #436

@serejke

Description

@serejke

Summary

The getSignatureStatuses RPC endpoint has a race condition that causes inconsistent API responses. Transactions may be reported as "not found" with a context slot X, but actually exist in an earlier slot Y (where Y < X).

Problem Discovery

This bug was discovered during high-throughput transaction processing:

  • Sending many transactions at once to the network
  • Calling getSignatureStatuses from a background confirmation process
  • Some responses returned null with a high context slot
  • But post-factum the transaction was found to be included in an earlier slot

This violates the semantic contract: if a transaction is not found at context slot X, it should not exist in any slot ≤ X.

Root Cause

The bug is in get_signature_statuses() at crates/core/src/rpc/full.rs:1468-1486.

The context slot is captured after each signature lookup inside the loop, rather than once at the beginning:

Box::pin(async move {
    let mut responses = Vec::with_capacity(signatures.len());
    let mut last_latest_absolute_slot = 0;
    for signature in signatures.into_iter() {
        let res = svm_locker
            .get_transaction(&remote_client, &signature, get_default_transaction_config())
            .await?;

        last_latest_absolute_slot = svm_locker.get_latest_absolute_slot(); // ← Captured AFTER each lookup
        responses.push(res.map_some_transaction_status());
    }
    Ok(RpcResponse {
        context: RpcResponseContext::new(last_latest_absolute_slot), // ← Uses last captured slot
        value: responses,
    })
})

Why This Causes the Bug

The SurfnetSvmLocker uses a Tokio RwLock<SurfnetSvm>, but locks are only held for brief individual operations, not across the entire RPC call.

During .await points in the async loop:

  1. The async runtime can switch to other tasks
  2. Other tasks can acquire write locks and advance the slot (e.g., confirming blocks)
  3. When the RPC task resumes, get_latest_absolute_slot() returns the new, higher slot

Race condition timeline:

Time | RPC Task                           | Other Task
-----|------------------------------------|--------------------------
T0   | get_transaction(sig) starts        |
T1   | → acquire read lock, check tx      |
T2   | → tx not found, release lock       |
T3   | .await on remote lookup            | ← acquires WRITE lock
T4   |   (suspended)                      | → confirm_current_block()
T5   |   (suspended)                      | → slot advances 100 → 101
T6   |   (suspended)                      | → release write lock
T7   | resumes from .await                |
T8   | get_latest_absolute_slot() = 101   |
T9   | Response: "not found", slot=101    |
     | Reality: tx exists in slot 100!    |

Additional Issue

If getSignatureStatuses is called with an empty signature array, the context slot is 0 (the initialization value) instead of the actual current slot.

Proposed Fix

Capture the context slot once at the beginning of the method, before any lookups:

Box::pin(async move {
    // Capture the context slot once at the beginning to ensure consistency
    let context_slot = svm_locker.get_latest_absolute_slot();

    let mut responses = Vec::with_capacity(signatures.len());
    for signature in signatures.into_iter() {
        let res = svm_locker
            .get_transaction(&remote_client, &signature, get_default_transaction_config())
            .await?;
        responses.push(res.map_some_transaction_status());
    }
    Ok(RpcResponse {
        context: RpcResponseContext::new(context_slot),
        value: responses,
    })
})

This ensures snapshot consistency: all lookups are evaluated as of the same slot.

Impact

  • High-throughput transaction processing workflows may see inconsistent results
  • Background confirmation processes may incorrectly conclude transactions failed
  • Batch queries return inconsistent context slots across signatures

Reproduction

  1. Run surfpool with continuous transaction processing
  2. Send multiple transactions rapidly
  3. Query getSignatureStatuses from a separate process while transactions are being confirmed
  4. Observe occasional null responses with context slots higher than the actual transaction slot

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions