When a primary shard's engine is created or reset, the InternalEngine
constructor performs an internal-only refresh during
restoreVersionMapAndCheckpointTracker. This updates the local
replication checkpoint via ReplicationCheckpointUpdater (an internal
refresh listener), but does not trigger CheckpointRefreshListener (an
external refresh listener) which is responsible for publishing the
checkpoint to replicas.
If no new writes arrive after the engine starts, the external reader
manager never observes a reader change, didRefresh remains false, and
the checkpoint is never published. Replicas believe they are current
and never request the missing segments, leaving them permanently stale.
The promotion path in updateShardState already handled this correctly
with an explicit updateReplicationCheckpoint() and
checkpointPublisher.publish() call after engine reset. Two other paths
were missing the same fix:
1. Primary recovery after node restart (initializing -> active with
same primary term): activatePrimaryMode() was called without
publishing the checkpoint. This is the path hit during rolling
upgrades.
2. resetToWriteableEngine() (relocation handoff): called
resetEngineToGlobalCheckpoint() without updating or publishing
the checkpoint afterward.
Signed-off-by: Andrew Ross <andrross@amazon.com>
When a primary shard's engine is created or reset, the InternalEngine constructor performs an internal-only refresh during restoreVersionMapAndCheckpointTracker. This updates the local replication checkpoint via ReplicationCheckpointUpdater (an internal refresh listener), but does not trigger CheckpointRefreshListener (an external refresh listener) which is responsible for publishing the checkpoint to replicas.
If no new writes arrive after the engine starts, the external reader manager never observes a reader change, didRefresh remains false, and the checkpoint is never published. Replicas believe they are current and never request the missing segments, leaving them permanently stale.
The promotion path in updateShardState already handled this correctly with an explicit updateReplicationCheckpoint() and checkpointPublisher.publish() call after engine reset. Two other paths were missing the same fix:
Primary recovery after node restart (initializing -> active with same primary term): activatePrimaryMode() was called without publishing the checkpoint. This is the path hit during rolling upgrades.
resetToWriteableEngine() (relocation handoff): called resetEngineToGlobalCheckpoint() without updating or publishing the checkpoint afterward.
Related Issues
Resolves #14302
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.