[Backport 2.x] Limit RW separation to remote store enabled clusters and update recovery flow#16999
Merged
Conversation
…ery flow (opensearch-project#16760) * Update search only replica recovery flow This PR includes multiple changes to search replica recovery. 1. Change search only replica copies to recover as empty store instead of PEER. This will run a store recovery that syncs segments from remote store directly and eliminate any primary communication. 2. Remove search replicas from the in-sync allocation ID set and update routing table to exclude them from allAllocationIds. This ensures primaries aren't tracking or validating the routing table for any search replica's presence. 3. Change search replica validation to require remote store. There are versions of the above changes that are still possible with primary based node-node replication, but I don't think they are worth making at this time. Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * more coverage Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * add changelog entry Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * add assertions that Search Replicas are not in the in-sync id set nor the AllAllocationIds set in the routing table Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * update async task to only run if the FF is enabled and we are a remote store cluster. This check had previously only checked for segrep Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * clean up max shards logic Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * remove search replicas from check during renewPeerRecoveryRetentionLeases Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * Revert "update async task to only run if the FF is enabled and we are a remote store cluster." reverting this, we already check for remote store earlier. This reverts commit 48ca1a3. Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * Add more tests for failover case Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * Update remotestore restore logic and add test ensuring we can restore only writers when red Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * Fix Search replicas to honor node level recovery limits Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * Fix translog UUID mismatch on existing store recovery. This commit adds PR feedback and recovery tests post node restart. Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * Fix spotless Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * Fix bug with remote restore and add more tests Signed-off-by: Marc Handalian <marc.handalian@gmail.com> --------- Signed-off-by: Marc Handalian <marc.handalian@gmail.com> (cherry picked from commit 8191de8)
Contributor
|
❕ Gradle check result for 3b04475: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 2.x #16999 +/- ##
============================================
+ Coverage 71.79% 71.92% +0.12%
- Complexity 65454 65599 +145
============================================
Files 5316 5318 +2
Lines 305529 305739 +210
Branches 44509 44580 +71
============================================
+ Hits 219359 219902 +543
+ Misses 67902 67504 -398
- Partials 18268 18333 +65 ☔ View full report in Codecov by Sentry. |
msfroh
approved these changes
Jan 10, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Manual backport of #16760 - Only conflict was in IndexShard.java because TranslogManager does not exist on 2.x.
Related Issues
Resolves #15952
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.