feat: speed benchmark and network creation optimization#1320
Open
feat: speed benchmark and network creation optimization#1320
Conversation
Add a speed_benchmark job to per-PR CI that measures network creation time against a 60s target. Includes TIMING: markers throughout the critical path (keystore gen, genesis gen, EL/CL/VC launch) for phase-level profiling.
# Conflicts: # src/participant_network.star
- Set validator_count=0 and use_separate_vc=false in speed test to skip entire keystore pipeline and VC launch - Add interval=0.5s to CL ready conditions (was using 1s default) - Add interval=0.5s to EL admin_nodeInfo wait (was using 1s default) - Extract genesis_validators_root and osaka_time within the genesis generator container to avoid needing jq in the read step - Add wait=None to osaka_time read step for potential parallelization
Short-circuit the entire keystore pipeline (avoid pulling and starting protolambda/eth2-val-tools container) when all participants have validator_count=0. Also fix lint formatting.
Fulu fork (epoch 0 default) requires supernodes/validators/peerdas which the minimal speed test doesn't configure. Set fulu_fork_epoch to far-future to skip this validation.
- Skip plan.wait() on admin_nodeInfo when num_participants==1 (enode only needed as bootnode for participant 1+) - Fix speed.yaml: use FAR_FUTURE_EPOCH for fulu_fork_epoch, add fixed genesis_time to skip timestamp container - Fix benchmark script: add set -euo pipefail to catch kurtosis failures through tee pipe
A genesis_time in 2030 means CL nodes wait years for genesis and never produce blocks. The dynamic computation adds ~2-5s but is necessary for a functional network.
The speed test needs actual validators to be a meaningful benchmark. Using defaults (128 validators, separate VC for lighthouse).
geth/reth/nethermind x lighthouse/teku/nimbus with 120s target.
Skip sequential enode extraction (EL[1..8]) and ENR/identity extraction (CL[1..8]) during the launch phase, since only the boot node's enode/ENR is needed as a bootnode for subsequent nodes. Collect the deferred enodes and identities after all VCs are launched, when nodes are already warm and responding faster. This moves ~16s of EL wait and ~25s of CL wait out of the critical path, replacing it with ~10s of post-launch collection.
- erigon and nimbus-eth1 use WS_RPC_PORT_ID instead of RPC_PORT_ID - Only geth, erigon, dummy, ethrex extract ENR from admin_nodeInfo - Auto-format with kurtosis lint
Add an image warmup phase that uses plan.add_services to pull all unique EL/CL/VC images in parallel before any launch phase begins. Docker deduplicates concurrent pulls at the layer level, so this single parallel pull warms the cache for all subsequent add_service and add_services calls. The throwaway warmer services are stopped immediately after the pulls complete.
This reverts commit 207e3cb.
…missing Defer ready_conditions for CL[1..8] so add_services returns after Docker pull+start without waiting for health checks. The health wait moves to collect_identities() which now uses plan.wait (retries) instead of plan.request (one-shot). CL nodes boot in background during VC launch. Also switch speed benchmarks from --image-download always to missing so Docker skips manifest re-verification for already-cached images.
With --image-download missing, kurtosis skips pulling cached images. Pre-pulling all client + genesis images in parallel before the kurtosis run means add_service/add_services calls find everything cached and only need to create+start containers (no network I/O).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Iterative optimization of network creation speed, targeting sub-60s for a minimal single-participant network.
Optimizations implemented:
validator_count=0(avoids pullingprotolambda/eth2-val-toolscontainer)interval=0.5son CL ready conditions and ELadmin_nodeInfowait (was using 1s default)plan.waitonadmin_nodeInfois unnecessary when no other EL nodes need the bootnode)genesis_validators_rootandosaka_timewithin the genesis generator container itselfBenchmark infrastructure:
speed_benchmarkCI job measures wall-clock time against 60s targetset -euo pipefailensures kurtosis failures are caughtTiming markers from CI (3-participant minimal.yaml):
CL launch is the dominant bottleneck. Single-participant benchmarks pending.