-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Description
Recently, there have been some PRs for fixing segment replication Rejecting stale metadata checkpoint, such as #20422 and #20551, all of which have adopted the approach of skipping verification. After some in-depth thinking, I believe we can adopt a more elegant way to solve this problem.
In #18944, ckp verification was added to fix flaky test. For ease of reading, I've pasted the key analysis here.
On deeper analysis, I found that this happens due to race condition in primary shard relocation. On primary shard relocation, the new primary has a bumped up segment infos generation and version which is broadcasted to all of it's replica via the checkpoint publisher. This happens around the same time when the shard_started primary action is called to active cluster manager to inform that the primary handover happened successfully. In certain condition, it was seen that the replica received the latest checkpoint from the new primary, but the cluster applier service was yet to be applied. This led to the replica reaching out to the old primary for getting the segment infos. This issue has slight probability of happening for indexes not getting any kind of ingestion during relocation after the permits have been acquired on the older primary.
Adding ckp verification can indeed cover the above scenarios of primary sharding migration, but it will also affect the processing of some normal logic, such as the situations mentioned in #20422 and #20551. We may still have some other situations that have not yet been discovered.
Solution
If we can specifically identify the scenario of primary shard relocation, we can avoid continuously patching the ckp verification logic.
Fortunately, we set state handoffInProgress during the hand-off phase of peer recovery. We can leverage this state to require that when the primary shard receives request GET_CHECKPOINT_INFO, it must be a started primary shard that is not in the hand-off process.
I submitted a PR and ran tests for verification. I hope @ashking94 @mch2 @andrross @cuonghm2809 @atris can take a look and provide feedback.
Related component
Indexing:Replication
Describe alternatives you've considered
No response
Additional context
No response