Summary
The getSignatureStatuses RPC endpoint has a race condition that causes inconsistent API responses. Transactions may be reported as "not found" with a context slot X, but actually exist in an earlier slot Y (where Y < X).
Problem Discovery
This bug was discovered during high-throughput transaction processing:
- Sending many transactions at once to the network
- Calling
getSignatureStatuses from a background confirmation process
- Some responses returned
null with a high context slot
- But post-factum the transaction was found to be included in an earlier slot
This violates the semantic contract: if a transaction is not found at context slot X, it should not exist in any slot ≤ X.
Root Cause
The bug is in get_signature_statuses() at crates/core/src/rpc/full.rs:1468-1486.
The context slot is captured after each signature lookup inside the loop, rather than once at the beginning:
Box::pin(async move {
let mut responses = Vec::with_capacity(signatures.len());
let mut last_latest_absolute_slot = 0;
for signature in signatures.into_iter() {
let res = svm_locker
.get_transaction(&remote_client, &signature, get_default_transaction_config())
.await?;
last_latest_absolute_slot = svm_locker.get_latest_absolute_slot(); // ← Captured AFTER each lookup
responses.push(res.map_some_transaction_status());
}
Ok(RpcResponse {
context: RpcResponseContext::new(last_latest_absolute_slot), // ← Uses last captured slot
value: responses,
})
})
Why This Causes the Bug
The SurfnetSvmLocker uses a Tokio RwLock<SurfnetSvm>, but locks are only held for brief individual operations, not across the entire RPC call.
During .await points in the async loop:
- The async runtime can switch to other tasks
- Other tasks can acquire write locks and advance the slot (e.g., confirming blocks)
- When the RPC task resumes,
get_latest_absolute_slot() returns the new, higher slot
Race condition timeline:
Time | RPC Task | Other Task
-----|------------------------------------|--------------------------
T0 | get_transaction(sig) starts |
T1 | → acquire read lock, check tx |
T2 | → tx not found, release lock |
T3 | .await on remote lookup | ← acquires WRITE lock
T4 | (suspended) | → confirm_current_block()
T5 | (suspended) | → slot advances 100 → 101
T6 | (suspended) | → release write lock
T7 | resumes from .await |
T8 | get_latest_absolute_slot() = 101 |
T9 | Response: "not found", slot=101 |
| Reality: tx exists in slot 100! |
Additional Issue
If getSignatureStatuses is called with an empty signature array, the context slot is 0 (the initialization value) instead of the actual current slot.
Proposed Fix
Capture the context slot once at the beginning of the method, before any lookups:
Box::pin(async move {
// Capture the context slot once at the beginning to ensure consistency
let context_slot = svm_locker.get_latest_absolute_slot();
let mut responses = Vec::with_capacity(signatures.len());
for signature in signatures.into_iter() {
let res = svm_locker
.get_transaction(&remote_client, &signature, get_default_transaction_config())
.await?;
responses.push(res.map_some_transaction_status());
}
Ok(RpcResponse {
context: RpcResponseContext::new(context_slot),
value: responses,
})
})
This ensures snapshot consistency: all lookups are evaluated as of the same slot.
Impact
- High-throughput transaction processing workflows may see inconsistent results
- Background confirmation processes may incorrectly conclude transactions failed
- Batch queries return inconsistent context slots across signatures
Reproduction
- Run surfpool with continuous transaction processing
- Send multiple transactions rapidly
- Query
getSignatureStatuses from a separate process while transactions are being confirmed
- Observe occasional
null responses with context slots higher than the actual transaction slot
Summary
The
getSignatureStatusesRPC endpoint has a race condition that causes inconsistent API responses. Transactions may be reported as "not found" with a context slot X, but actually exist in an earlier slot Y (where Y < X).Problem Discovery
This bug was discovered during high-throughput transaction processing:
getSignatureStatusesfrom a background confirmation processnullwith a high context slotThis violates the semantic contract: if a transaction is not found at context slot X, it should not exist in any slot ≤ X.
Root Cause
The bug is in
get_signature_statuses()atcrates/core/src/rpc/full.rs:1468-1486.The context slot is captured after each signature lookup inside the loop, rather than once at the beginning:
Why This Causes the Bug
The
SurfnetSvmLockeruses a TokioRwLock<SurfnetSvm>, but locks are only held for brief individual operations, not across the entire RPC call.During
.awaitpoints in the async loop:get_latest_absolute_slot()returns the new, higher slotRace condition timeline:
Additional Issue
If
getSignatureStatusesis called with an empty signature array, the context slot is0(the initialization value) instead of the actual current slot.Proposed Fix
Capture the context slot once at the beginning of the method, before any lookups:
This ensures snapshot consistency: all lookups are evaluated as of the same slot.
Impact
Reproduction
getSignatureStatusesfrom a separate process while transactions are being confirmednullresponses with context slots higher than the actual transaction slot