Skip to content

[4/6 Execution] Unwind Execution log is misleading at batch boundaries — looks like a 54-block mainnet reorg but isn't #20713

@yperbasis

Description

@yperbasis

Summary

When the Execution stage hits a batch boundary and the forkchoice pipeline re-enters the stages, Erigon emits a log line that reads exactly like a large chain reorg:

[4/6 Execution] Unwind Execution         from=24927474 to=24927420
[4/6 Execution] serial done              ... blk=24927421 blks=1 ...
head updated                             hash=0x094db7c4... number=24927421 ... age=10m51s

On Ethereum mainnet this appears as a ~54-block "unwind" happening roughly every ~11 minutes (once per epoch-ish cadence of the pipeline batch), which makes operators reasonably worried. It isn't a chain reorg at all — it's the pipeline discarding the fork-validator's optimistic tip state so the next batch can re-run the stages from the last committed point.

Concrete proof this is not a reorg, from the same node:

11:33:18  serial done blk=24927421 blks=64 ...         (<-- batch N committed)
11:33:18  head updated hash=0x094db7c4 number=24927421 age=7s
11:33..11:43  many "head validated" for 24927422..24927474  (fork-validator handling tip)
11:44:01  Unwind Execution from=24927474 to=24927420
11:44:02  head updated hash=0x094db7c4 number=24927421 age=10m51s  <-- SAME hash as 11:33
11:46:01  [3/6 Senders] Started from=24927421 to=24927485  (<-- batch N+1)
11:46:06  head updated number=24927485 age=7s

The post-"unwind" head is the exact same block hash as the pre-"unwind" batch-commit — just older in wall-clock time. There is no competing fork, no canonical change at any block number; the pipeline is just resetting its stage progress back to the last batch-commit point so the next batch can reapply the 54 forward blocks (mostly by merging the fork-validator's cached state, not re-executing).

Ask

  1. Rename / reword the log line so it's clear this is a pipeline-batch reset, not a chain reorg. Something like:
    [4/6 Execution] Pipeline batch reset: stage progress X → Y (reapplying fork-validator state for Y+1..X on next batch)
    or distinguishing reorg-driven unwinds (via the forkchoice walk-back finding a genuinely different common ancestor) from batch-reset unwinds (same canonical hash at target number as at source).

  2. Don't emit head updated ... age=10m51s for the batch-commit block after the reset — or at least log it with a different key phrase, since it makes it look like the chain head regressed by 10 minutes. A user watching the log can't tell from this line alone that their chain is still healthy.

  3. (Optional, lower priority) Consider whether the batch boundary is actually necessary here or whether the fork-validator's extending state can flow into stage progress without the synthetic unwind. There's a minor steady-state cost — ~2-3s flush/commit per batch and a tip-latency spike that causes the block immediately after the boundary to register with age=10-17s instead of the usual 1-3s. Functional, but not free. This one likely warrants its own discussion; the main ask is log clarity.

Versions

Observed on main (built against commit a95d2aa170, though the log strings have been there much longer; this isn't a regression). Ethereum mainnet, fresh MDBX sync on snapshots.

Context

Surfaced while debugging a separate commitment-domain unwind bug (#20710) — each of these "unwinds" hit the actual bug before it was fixed. With the fix in place they're benign, but the log has been a repeated source of confusion.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions