Reconstruct supernode sidecars from 50% - Reconstructor#10508
Reconstruct supernode sidecars from 50% - Reconstructor#10508zilm13 wants to merge 23 commits intoConsensys:masterfrom
Conversation
...tech/pegasys/teku/statetransition/datacolumns/DataColumnSidecarArchiveReconstructorImpl.java
Outdated
Show resolved
Hide resolved
...tech/pegasys/teku/statetransition/datacolumns/DataColumnSidecarArchiveReconstructorImpl.java
Show resolved
Hide resolved
| return reconstructionAsyncRunner | ||
| .runAsync( | ||
| () -> | ||
| dataColumnSidecars.thenCombineAsync( |
There was a problem hiding this comment.
(claude) Bug — thenCombineAsync runs on common pool, not the reconstruction pool
runAsync(() -> dataColumnSidecars.thenCombineAsync(kzgProofs, combiner)) submits the lambda to reconstructionAsyncRunner, but that lambda just configures the future — the actual heavy KZG work inside the combiner runs on ForkJoinPool.commonPool()
(the default for thenCombineAsync without an executor arg). Should be:
dataColumnSidecars.thenCombineAsync(kzgProofs, combiner, reconstructionAsyncRunner.getExecutor())
or the wrapping runAsync serves no purpose.
| } | ||
|
|
||
| final List<BlobAndCellProofs> blobAndCellProofsList = | ||
| constructBlobAndCellProofsList(sidecars, proofs); |
There was a problem hiding this comment.
I think this is ok, given that columns should be ordered (iirc), but noting here as it was in the claude review...
(claude) Bug — constructBlobAndCellProofsList assumes sidecars are ordered by column index
sidecars.get(j) at line ~196 assumes the j-th element of the list has column index j. Nothing in the storage API guarantees getDataColumnSidecars returns results in column-index order. If the order isn't guaranteed, the blob reconstruction is
silently wrong (cells get assembled in the wrong order).
There was a problem hiding this comment.
added the check to the list
| this.specConfigFulu = | ||
| SpecConfigFulu.required(spec.forMilestone(SpecMilestone.FULU).getConfig()); | ||
| this.halfColumns = specConfigFulu.getNumberOfColumns() / 2; | ||
| this.recoveryTasks = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
(claude) Concern — recoveryTasks is unbounded
recoveryTasks grows forever if any caller fails to call onRequestCompleted. There's no TTL, no max-size, no eviction. In a long-running node this is a slow memory leak. At minimum there should be a log warning if the map grows large.
There was a problem hiding this comment.
We have a guaranteed cleanup in final block, clarified with comment
| if (this.nextTaskId.get() > (Integer.MAX_VALUE - 1000)) { | ||
| this.nextTaskId.set(0); | ||
| } |
There was a problem hiding this comment.
(claude) Nit — task ID wrap-around is non-atomic
if (this.nextTaskId.get() > (Integer.MAX_VALUE - 1000)) {
this.nextTaskId.set(0);
}
The get() + set(0) is not atomic. Two threads could both observe the condition true and both reset. Could use compareAndSet or just let AtomicInteger overflow naturally (wrapping to negative is fine for a map key, or use Math.abs).
| final BlobSchema blobSchema = | ||
| SchemaDefinitionsFulu.required(spec.atSlot(firstSidecar.getSlot()).getSchemaDefinitions()) | ||
| .getBlobSchema(); |
There was a problem hiding this comment.
(claude) Nit — blobSchema resolved inside loop unnecessarily
In constructBlobAndCellProofsList, blobSchema is derived from spec.atSlot(firstSidecar.getSlot()) on every loop iteration but it's constant for the whole method. Hoist it above the loop.
| public SafeFuture<Optional<DataColumnSidecar>> reconstructDataColumnSidecar( | ||
| final SignedBeaconBlock block, final UInt64 index, final int requestId) { | ||
| final Map<SlotAndBlockRoot, SafeFuture<ReconstructionResult>> slotAndBlockRootSafeFutureMap = |
There was a problem hiding this comment.
probably worth a comment for future us...
(claude) Nit — first-half index requests return Optional.empty() silently
reconstructDataColumnSidecar with an index < halfColumns silently returns empty (the ReconstructionResult map only covers the second half). This is documented in the test but could surprise callers. An early guard/assertion at the entry point
would make the contract explicit.
| public void shouldReconstructSecondHalfSidecars() { | ||
| final SignedBeaconBlock block = createBlockWithBlobs(UInt64.ONE); |
There was a problem hiding this comment.
im less worried about this one... mentioned as was part of claude generated review.
(claude) Concern — shouldReconstructSecondHalfSidecars may not test actual reconstruction
createFirstHalfSidecars returns random-data sidecars and createSecondHalfProofs returns random KZG proofs. If constructDataColumnSidecars calls into the real KZG library, reconstruction of invalid data will either throw (caught by exceptionally →
emptyResult) making the test fail, or silently produce garbage. The test should either use a real KZG round-trip with valid blob data, or mock DataColumnSidecarUtil to verify the wiring without invoking KZG.
There was a problem hiding this comment.
used mock for DataColumnSidecarUtil
...tech/pegasys/teku/statetransition/datacolumns/DataColumnSidecarArchiveReconstructorImpl.java
Outdated
Show resolved
Hide resolved
...tech/pegasys/teku/statetransition/datacolumns/DataColumnSidecarArchiveReconstructorImpl.java
Outdated
Show resolved
Hide resolved
...tech/pegasys/teku/statetransition/datacolumns/DataColumnSidecarArchiveReconstructorImpl.java
Outdated
Show resolved
Hide resolved
...tech/pegasys/teku/statetransition/datacolumns/DataColumnSidecarArchiveReconstructorImpl.java
Show resolved
Hide resolved
|
@rolfyone I've addressed all feedback |

PR Description
Reconstructor part + wiring + benchmark
Part of #10095
Fixed Issue(s)
Documentation
doc-change-requiredlabel to this PR if updates are required.Changelog
Note
Medium Risk
Adds new async reconstruction and pruning signaling logic for archived
DataColumnSidecars in Fulu, which affects data availability behavior and introduces new background tasks/metrics; incorrect gating or pruning boundaries could impact sidecar retention or retrieval.Overview
Implements a Fulu-only
DataColumnSidecarArchiveReconstructorImplthat can reconstruct second-halfDataColumnSidecars from archived first-half sidecars plus stored KZG proofs, caching per-request/per-block reconstruction tasks and exporting success/failure + timing metrics.Wires the reconstructor into
BeaconChainController(created with a dedicatedAsyncRunner, subscribed toFinalizedCheckpointChannel) and adds a newSidecarArchivePrunableChannelevent to signal when archived sidecars become prunable based on finalized/current epoch and the configured retention window.Extends
CombinedChainDataClientwithgetDataColumnSidecarProofs(slot)to fetch the per-column KZG proof data, and adds a JMH benchmark path (reconstructDataColumnSidecarsFromArchive) plus unit tests covering reconstruction, task reuse/cleanup, and pruning-boundary behavior.Written by Cursor Bugbot for commit 8ed7d26. This will update automatically on new commits. Configure here.