Skip to content

feat: speed benchmark and network creation optimization#1320

Open
samcm wants to merge 15 commits intomainfrom
feat/faster
Open

feat: speed benchmark and network creation optimization#1320
samcm wants to merge 15 commits intomainfrom
feat/faster

Conversation

@samcm
Copy link
Member

@samcm samcm commented Feb 24, 2026

Summary

Iterative optimization of network creation speed, targeting sub-60s for a minimal single-participant network.

Optimizations implemented:

  • Skip keystore generation when all participants have validator_count=0 (avoids pulling protolambda/eth2-val-tools container)
  • Faster polling intervals - interval=0.5s on CL ready conditions and EL admin_nodeInfo wait (was using 1s default)
  • Skip enode extraction for single-participant networks (plan.wait on admin_nodeInfo is unnecessary when no other EL nodes need the bootnode)
  • Optimized genesis generation - extract genesis_validators_root and osaka_time within the genesis generator container itself
  • Speed benchmark CI job with TIMING markers throughout the critical path for phase-level profiling

Benchmark infrastructure:

  • speed_benchmark CI job measures wall-clock time against 60s target
  • set -euo pipefail ensures kurtosis failures are caught
  • TIMING markers at: keystore_generation, genesis_generation, el_launch, cl_launch, vc_launch, participant_network, run

Timing markers from CI (3-participant minimal.yaml):

Phase Duration
keystore_generation ~3.8s
genesis_generation ~3.7s
el_launch ~15.4s
cl_launch ~41.9s
vc_launch ~5.9s
total ~71s

CL launch is the dominant bottleneck. Single-participant benchmarks pending.

samcm added 15 commits February 24, 2026 13:33
Add a speed_benchmark job to per-PR CI that measures network creation
time against a 60s target. Includes TIMING: markers throughout the
critical path (keystore gen, genesis gen, EL/CL/VC launch) for
phase-level profiling.
# Conflicts:
#	src/participant_network.star
- Set validator_count=0 and use_separate_vc=false in speed test to
  skip entire keystore pipeline and VC launch
- Add interval=0.5s to CL ready conditions (was using 1s default)
- Add interval=0.5s to EL admin_nodeInfo wait (was using 1s default)
- Extract genesis_validators_root and osaka_time within the genesis
  generator container to avoid needing jq in the read step
- Add wait=None to osaka_time read step for potential parallelization
Short-circuit the entire keystore pipeline (avoid pulling and starting
protolambda/eth2-val-tools container) when all participants have
validator_count=0. Also fix lint formatting.
Fulu fork (epoch 0 default) requires supernodes/validators/peerdas
which the minimal speed test doesn't configure. Set fulu_fork_epoch
to far-future to skip this validation.
- Skip plan.wait() on admin_nodeInfo when num_participants==1 (enode
  only needed as bootnode for participant 1+)
- Fix speed.yaml: use FAR_FUTURE_EPOCH for fulu_fork_epoch, add
  fixed genesis_time to skip timestamp container
- Fix benchmark script: add set -euo pipefail to catch kurtosis
  failures through tee pipe
A genesis_time in 2030 means CL nodes wait years for genesis and
never produce blocks. The dynamic computation adds ~2-5s but is
necessary for a functional network.
The speed test needs actual validators to be a meaningful benchmark.
Using defaults (128 validators, separate VC for lighthouse).
geth/reth/nethermind x lighthouse/teku/nimbus with 120s target.
Skip sequential enode extraction (EL[1..8]) and ENR/identity
extraction (CL[1..8]) during the launch phase, since only the
boot node's enode/ENR is needed as a bootnode for subsequent nodes.

Collect the deferred enodes and identities after all VCs are
launched, when nodes are already warm and responding faster.
This moves ~16s of EL wait and ~25s of CL wait out of the
critical path, replacing it with ~10s of post-launch collection.
- erigon and nimbus-eth1 use WS_RPC_PORT_ID instead of RPC_PORT_ID
- Only geth, erigon, dummy, ethrex extract ENR from admin_nodeInfo
- Auto-format with kurtosis lint
Add an image warmup phase that uses plan.add_services to pull all
unique EL/CL/VC images in parallel before any launch phase begins.
Docker deduplicates concurrent pulls at the layer level, so this
single parallel pull warms the cache for all subsequent add_service
and add_services calls. The throwaway warmer services are stopped
immediately after the pulls complete.
…missing

Defer ready_conditions for CL[1..8] so add_services returns after Docker
pull+start without waiting for health checks. The health wait moves to
collect_identities() which now uses plan.wait (retries) instead of
plan.request (one-shot). CL nodes boot in background during VC launch.

Also switch speed benchmarks from --image-download always to missing so
Docker skips manifest re-verification for already-cached images.
With --image-download missing, kurtosis skips pulling cached images.
Pre-pulling all client + genesis images in parallel before the kurtosis
run means add_service/add_services calls find everything cached and
only need to create+start containers (no network I/O).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant