Skip to content

Reconstruct supernode sidecars from 50% - Reconstructor#10508

Open
zilm13 wants to merge 23 commits intoConsensys:masterfrom
zilm13:reconstruct-supernode-sidecars-reconstruction-part
Open

Reconstruct supernode sidecars from 50% - Reconstructor#10508
zilm13 wants to merge 23 commits intoConsensys:masterfrom
zilm13:reconstruct-supernode-sidecars-reconstruction-part

Conversation

@zilm13
Copy link
Copy Markdown
Contributor

@zilm13 zilm13 commented Mar 18, 2026

PR Description

Reconstructor part + wiring + benchmark
Part of #10095

Fixed Issue(s)

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.

Changelog

  • I thought about adding a changelog entry, and added one if I deemed necessary.

Note

Medium Risk
Adds new async reconstruction and pruning signaling logic for archived DataColumnSidecars in Fulu, which affects data availability behavior and introduces new background tasks/metrics; incorrect gating or pruning boundaries could impact sidecar retention or retrieval.

Overview
Implements a Fulu-only DataColumnSidecarArchiveReconstructorImpl that can reconstruct second-half DataColumnSidecars from archived first-half sidecars plus stored KZG proofs, caching per-request/per-block reconstruction tasks and exporting success/failure + timing metrics.

Wires the reconstructor into BeaconChainController (created with a dedicated AsyncRunner, subscribed to FinalizedCheckpointChannel) and adds a new SidecarArchivePrunableChannel event to signal when archived sidecars become prunable based on finalized/current epoch and the configured retention window.

Extends CombinedChainDataClient with getDataColumnSidecarProofs(slot) to fetch the per-column KZG proof data, and adds a JMH benchmark path (reconstructDataColumnSidecarsFromArchive) plus unit tests covering reconstruction, task reuse/cleanup, and pruning-boundary behavior.

Written by Cursor Bugbot for commit 8ed7d26. This will update automatically on new commits. Configure here.

return reconstructionAsyncRunner
.runAsync(
() ->
dataColumnSidecars.thenCombineAsync(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(claude) Bug — thenCombineAsync runs on common pool, not the reconstruction pool

runAsync(() -> dataColumnSidecars.thenCombineAsync(kzgProofs, combiner)) submits the lambda to reconstructionAsyncRunner, but that lambda just configures the future — the actual heavy KZG work inside the combiner runs on ForkJoinPool.commonPool()
(the default for thenCombineAsync without an executor arg). Should be:
dataColumnSidecars.thenCombineAsync(kzgProofs, combiner, reconstructionAsyncRunner.getExecutor())
or the wrapping runAsync serves no purpose.

}

final List<BlobAndCellProofs> blobAndCellProofsList =
constructBlobAndCellProofsList(sidecars, proofs);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok, given that columns should be ordered (iirc), but noting here as it was in the claude review...

(claude) Bug — constructBlobAndCellProofsList assumes sidecars are ordered by column index
sidecars.get(j) at line ~196 assumes the j-th element of the list has column index j. Nothing in the storage API guarantees getDataColumnSidecars returns results in column-index order. If the order isn't guaranteed, the blob reconstruction is
silently wrong (cells get assembled in the wrong order).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the check to the list

this.specConfigFulu =
SpecConfigFulu.required(spec.forMilestone(SpecMilestone.FULU).getConfig());
this.halfColumns = specConfigFulu.getNumberOfColumns() / 2;
this.recoveryTasks = new ConcurrentHashMap<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(claude) Concern — recoveryTasks is unbounded
recoveryTasks grows forever if any caller fails to call onRequestCompleted. There's no TTL, no max-size, no eviction. In a long-running node this is a slow memory leak. At minimum there should be a log warning if the map grows large.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a guaranteed cleanup in final block, clarified with comment

Comment on lines +282 to +284
if (this.nextTaskId.get() > (Integer.MAX_VALUE - 1000)) {
this.nextTaskId.set(0);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(claude) Nit — task ID wrap-around is non-atomic

  if (this.nextTaskId.get() > (Integer.MAX_VALUE - 1000)) {                                                                                                                                                                                              
      this.nextTaskId.set(0);                              
  }                                                                                                                                                                                                                                                      

The get() + set(0) is not atomic. Two threads could both observe the condition true and both reset. Could use compareAndSet or just let AtomicInteger overflow naturally (wrapping to negative is fine for a map key, or use Math.abs).

Comment on lines +207 to +209
final BlobSchema blobSchema =
SchemaDefinitionsFulu.required(spec.atSlot(firstSidecar.getSlot()).getSchemaDefinitions())
.getBlobSchema();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(claude) Nit — blobSchema resolved inside loop unnecessarily
In constructBlobAndCellProofsList, blobSchema is derived from spec.atSlot(firstSidecar.getSlot()) on every loop iteration but it's constant for the whole method. Hoist it above the loop.

Comment on lines +113 to +115
public SafeFuture<Optional<DataColumnSidecar>> reconstructDataColumnSidecar(
final SignedBeaconBlock block, final UInt64 index, final int requestId) {
final Map<SlotAndBlockRoot, SafeFuture<ReconstructionResult>> slotAndBlockRootSafeFutureMap =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth a comment for future us...

(claude) Nit — first-half index requests return Optional.empty() silently
reconstructDataColumnSidecar with an index < halfColumns silently returns empty (the ReconstructionResult map only covers the second half). This is documented in the test but could surprise callers. An early guard/assertion at the entry point
would make the contract explicit.

Comment on lines +112 to +113
public void shouldReconstructSecondHalfSidecars() {
final SignedBeaconBlock block = createBlockWithBlobs(UInt64.ONE);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im less worried about this one... mentioned as was part of claude generated review.

(claude) Concern — shouldReconstructSecondHalfSidecars may not test actual reconstruction
createFirstHalfSidecars returns random-data sidecars and createSecondHalfProofs returns random KZG proofs. If constructDataColumnSidecars calls into the real KZG library, reconstruction of invalid data will either throw (caught by exceptionally →
emptyResult) making the test fail, or silently produce garbage. The test should either use a real KZG round-trip with valid blob data, or mock DataColumnSidecarUtil to verify the wiring without invoking KZG.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used mock for DataColumnSidecarUtil

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

@zilm13
Copy link
Copy Markdown
Contributor Author

zilm13 commented Apr 1, 2026

@rolfyone I've addressed all feedback

@zilm13 zilm13 requested a review from rolfyone April 1, 2026 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants