Fix segment replication checkpoint not published after engine reset by andrross · Pull Request #20749 · opensearch-project/OpenSearch

andrross · 2026-02-27T23:51:47Z

When a primary shard's engine is created or reset, the InternalEngine constructor performs an internal-only refresh during restoreVersionMapAndCheckpointTracker. This updates the local replication checkpoint via ReplicationCheckpointUpdater (an internal refresh listener), but does not trigger CheckpointRefreshListener (an external refresh listener) which is responsible for publishing the checkpoint to replicas.

If no new writes arrive after the engine starts, the external reader manager never observes a reader change, didRefresh remains false, and the checkpoint is never published. Replicas believe they are current and never request the missing segments, leaving them permanently stale.

The promotion path in updateShardState already handled this correctly with an explicit updateReplicationCheckpoint() and checkpointPublisher.publish() call after engine reset. Two other paths were missing the same fix:

Primary recovery after node restart (initializing -> active with same primary term): activatePrimaryMode() was called without publishing the checkpoint. This is the path hit during rolling upgrades.
resetToWriteableEngine() (relocation handoff): called resetEngineToGlobalCheckpoint() without updating or publishing the checkpoint afterward.

Related Issues

Resolves #14302

Check List

Functionality includes testing.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

When a primary shard's engine is created or reset, the InternalEngine constructor performs an internal-only refresh during restoreVersionMapAndCheckpointTracker. This updates the local replication checkpoint via ReplicationCheckpointUpdater (an internal refresh listener), but does not trigger CheckpointRefreshListener (an external refresh listener) which is responsible for publishing the checkpoint to replicas. If no new writes arrive after the engine starts, the external reader manager never observes a reader change, didRefresh remains false, and the checkpoint is never published. Replicas believe they are current and never request the missing segments, leaving them permanently stale. The promotion path in updateShardState already handled this correctly with an explicit updateReplicationCheckpoint() and checkpointPublisher.publish() call after engine reset. Two other paths were missing the same fix: 1. Primary recovery after node restart (initializing -> active with same primary term): activatePrimaryMode() was called without publishing the checkpoint. This is the path hit during rolling upgrades. 2. resetToWriteableEngine() (relocation handoff): called resetEngineToGlobalCheckpoint() without updating or publishing the checkpoint afterward. Signed-off-by: Andrew Ross <andrross@amazon.com>

github-actions · 2026-02-27T23:52:52Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Missing Publish In `resetToWriteableEngine()`, `updateReplicationCheckpoint()` is called but `checkpointPublisher.publish()` is never called afterward. The PR description explicitly states the fix requires both updating and publishing the checkpoint (as done in the `updateShardState` promotion path), but the publish step is missing here. This means replicas may still not receive the updated checkpoint after a relocation handoff engine reset. updateReplicationCheckpoint(); Missing Guard In `resetToWriteableEngine()`, the call to `updateReplicationCheckpoint()` is not guarded by a check for `indexSettings.isSegRepLocalEnabled()` and `checkpointPublisher != null`, unlike the analogous fix in `updateShardState()`. This could cause a NullPointerException or unintended behavior when segment replication is not enabled. updateReplicationCheckpoint();

andrross · 2026-02-27T23:52:53Z

Full disclosure, kiro-cli (aka claude opus) and I debugged this together so I would appreciate some extra eyes from the segment replication experts: @mch2 @kkewwei @guojialiang92 @gbbafna

github-actions · 2026-02-27T23:53:09Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Publish checkpoint inside the blocked operations scope The `updateReplicationCheckpoint()` call in `resetToWriteableEngine()` is placed outside the `blockOperations` lambda, meaning it runs after the block is released. This could lead to a race condition where operations resume before the checkpoint is published. Additionally, unlike the `updateShardState` path, there is no guard checking `indexSettings.isSegRepLocalEnabled()` and `checkpointPublisher != null`, nor is `checkpointPublisher.publish()` explicitly called, which may result in the checkpoint not being propagated to replicas. server/src/main/java/org/opensearch/index/shard/IndexShard.java [2300-2306] public void resetToWriteableEngine() throws IOException, InterruptedException, TimeoutException { - indexShardOperationPermits.blockOperations(30, TimeUnit.MINUTES, () -> { resetEngineToGlobalCheckpoint(); }); - // Force update and publish checkpoint so replicas learn about segments that existed - // before the engine reset. See resetEngineToGlobalCheckpoint() in updateShardState() - // promotion path for the same pattern. - updateReplicationCheckpoint(); + indexShardOperationPermits.blockOperations(30, TimeUnit.MINUTES, () -> { + resetEngineToGlobalCheckpoint(); + // Force update and publish checkpoint so replicas learn about segments that existed + // before the engine reset. See updateShardState() promotion path for the same pattern. + if (indexSettings.isSegRepLocalEnabled() && checkpointPublisher != null) { + updateReplicationCheckpoint(); + checkpointPublisher.publish(this, getLatestReplicationCheckpoint()); + } + }); } Suggestion importance[1-10]: 6 __ Why: The suggestion raises a valid concern about the `updateReplicationCheckpoint()` call being outside the `blockOperations` lambda, which could allow operations to resume before the checkpoint is updated. However, the `updateShardState` path also calls these methods outside any lock, suggesting this may be intentional design. The missing guard for `isSegRepLocalEnabled()` and `checkpointPublisher != null` is a more concrete concern.	Low
General	Guard against null checkpoint before publishing The `updateReplicationCheckpoint()` call updates the checkpoint, but then `checkpointPublisher.publish()` is called with `getLatestReplicationCheckpoint()`. If `updateReplicationCheckpoint()` is asynchronous or the checkpoint is not immediately reflected, there could be a mismatch. It would be safer to capture the result of `updateReplicationCheckpoint()` (if it returns the checkpoint) or ensure the publish uses the same checkpoint that was just set. server/src/main/java/org/opensearch/index/shard/IndexShard.java [798-805] if (indexSettings.isSegRepLocalEnabled() && this.checkpointPublisher != null) { // Force publish checkpoint so replicas catch up to segments that existed before // the node restart. Without this, the internal refresh during engine construction // may have already opened the latest reader, causing subsequent external refreshes // to see no change and never trigger CheckpointRefreshListener. updateReplicationCheckpoint(); - checkpointPublisher.publish(this, getLatestReplicationCheckpoint()); + final ReplicationCheckpoint checkpoint = getLatestReplicationCheckpoint(); + if (checkpoint != null) { + checkpointPublisher.publish(this, checkpoint); + } } Suggestion importance[1-10]: 3 __ Why: The suggestion to capture the checkpoint and guard against null is a minor defensive improvement, but `getLatestReplicationCheckpoint()` is unlikely to return null in this context since the engine was just activated. The `improved_code` also changes the logic slightly without strong justification.	Low

github-actions · 2026-02-28T01:07:03Z

❌ Gradle check result for bd62e7d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

andrross · 2026-03-03T22:21:37Z

I'm not confident this is the right way to fix the issues we're seeing. Closing for now.

andrross requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, anasalkouz, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners February 27, 2026 23:51

github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run Indexing Indexing, Bulk Indexing and anything related to indexing labels Feb 27, 2026

andrross closed this Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix segment replication checkpoint not published after engine reset#20749

Fix segment replication checkpoint not published after engine reset#20749
andrross wants to merge 1 commit intoopensearch-project:mainfrom
andrross:fix-checkpoint-not-published

andrross commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

andrross commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

andrross commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andrross commented Feb 27, 2026

Related Issues

Check List

Uh oh!

github-actions bot commented Feb 27, 2026

PR Reviewer Guide 🔍

Uh oh!

andrross commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

PR Code Suggestions ✨

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

andrross commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant