[Segment Replication] Refactor file cleanup logic and fix PIT/Scroll with remote store.#9111
Merged
mch2 merged 8 commits intoopensearch-project:mainfrom Aug 10, 2023
Merged
Conversation
Contributor
|
Compatibility status: |
Contributor
Gradle Check (Jenkins) Run Completed with:
|
Contributor
Gradle Check (Jenkins) Run Completed with:
|
Contributor
Gradle Check (Jenkins) Run Completed with:
|
Contributor
|
The above 2 build having the same test failure |
Member
Author
|
Thanks @tlfeng will get this cleaned up. |
Contributor
Gradle Check (Jenkins) Run Completed with:
|
Contributor
|
Compatibility status: |
Contributor
|
Compatibility status: |
Contributor
Gradle Check (Jenkins) Run Completed with:
|
Contributor
Gradle Check (Jenkins) Run Completed with:
|
Contributor
Gradle Check (Jenkins) Run Completed with:
|
Contributor
|
Compatibility status: |
Contributor
|
Compatibility status: |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This change removes divergent commit paths for segrep node-node and remote store. All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file. This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before an engine is opened. This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0. Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
…it a new segmentInfos. Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Contributor
|
Compatibility status: |
Contributor
Gradle Check (Jenkins) Run Completed with:
|
dreamer-89
approved these changes
Aug 10, 2023
Member
Assertion trip here |
Member
Author
Not able to repro this locally - @ankitkala @gbbafna wondering if you have context here? |
Contributor
Gradle Check (Jenkins) Run Completed with:
|
Contributor
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-9111-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c30154458a44e91a2f245b2357e69ecc839265a9
# Push it to GitHub
git push --set-upstream origin backport/backport-9111-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.xThen, create a pull request where the |
neetikasinghal
pushed a commit
to neetikasinghal/OpenSearch
that referenced
this pull request
Aug 10, 2023
…with remote store. (opensearch-project#9111) * Remove divergent commit logic with segment replication. This change removes divergent commit paths for segrep node-node and remote store. All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file. This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before an engine is opened. This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more NRTReplicationEngineTests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more shard level tests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add test ensuring commits are cleaned up on replicas. Signed-off-by: Marc Handalian <handalm@amazon.com> * Self review. Signed-off-by: Marc Handalian <handalm@amazon.com> * Use refresh level sync before recovery Signed-off-by: Marc Handalian <handalm@amazon.com> * PR feedback. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com>
mch2
added a commit
to mch2/OpenSearch
that referenced
this pull request
Aug 11, 2023
…with remote store. (opensearch-project#9111) * Remove divergent commit logic with segment replication. This change removes divergent commit paths for segrep node-node and remote store. All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file. This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before an engine is opened. This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more NRTReplicationEngineTests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more shard level tests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add test ensuring commits are cleaned up on replicas. Signed-off-by: Marc Handalian <handalm@amazon.com> * Self review. Signed-off-by: Marc Handalian <handalm@amazon.com> * Use refresh level sync before recovery Signed-off-by: Marc Handalian <handalm@amazon.com> * PR feedback. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com> (cherry picked from commit c301544)
mch2
added a commit
that referenced
this pull request
Aug 11, 2023
…fix PIT/Scroll with remote store. (#9272) * [Segment Replication] Refactor file cleanup logic and fix PIT/Scroll with remote store. (#9111) * Remove divergent commit logic with segment replication. This change removes divergent commit paths for segrep node-node and remote store. All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file. This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before an engine is opened. This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more NRTReplicationEngineTests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more shard level tests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add test ensuring commits are cleaned up on replicas. Signed-off-by: Marc Handalian <handalm@amazon.com> * Self review. Signed-off-by: Marc Handalian <handalm@amazon.com> * Use refresh level sync before recovery Signed-off-by: Marc Handalian <handalm@amazon.com> * PR feedback. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com> (cherry picked from commit c301544) * Fix test SegmentReplicationIndexShardTests.testPrimaryRestart. This test is specific to remote store and should not be run for node-node replication. Signed-off-by: Marc Handalian <handalm@amazon.com> (cherry picked from commit a33f67e) Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com>
Contributor
|
@mch2 : I saw this failure in relocation tests failures as well . Will create an issue and take a look . |
linuxpi
pushed a commit
to linuxpi/OpenSearch
that referenced
this pull request
Aug 14, 2023
…with remote store. (opensearch-project#9111) * Remove divergent commit logic with segment replication. This change removes divergent commit paths for segrep node-node and remote store. All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file. This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before an engine is opened. This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more NRTReplicationEngineTests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more shard level tests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add test ensuring commits are cleaned up on replicas. Signed-off-by: Marc Handalian <handalm@amazon.com> * Self review. Signed-off-by: Marc Handalian <handalm@amazon.com> * Use refresh level sync before recovery Signed-off-by: Marc Handalian <handalm@amazon.com> * PR feedback. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com>
linuxpi
pushed a commit
to linuxpi/OpenSearch
that referenced
this pull request
Aug 16, 2023
…with remote store. (opensearch-project#9111) * Remove divergent commit logic with segment replication. This change removes divergent commit paths for segrep node-node and remote store. All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file. This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before an engine is opened. This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more NRTReplicationEngineTests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more shard level tests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add test ensuring commits are cleaned up on replicas. Signed-off-by: Marc Handalian <handalm@amazon.com> * Self review. Signed-off-by: Marc Handalian <handalm@amazon.com> * Use refresh level sync before recovery Signed-off-by: Marc Handalian <handalm@amazon.com> * PR feedback. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com>
kaushalmahi12
pushed a commit
to kaushalmahi12/OpenSearch
that referenced
this pull request
Sep 12, 2023
…with remote store. (opensearch-project#9111) * Remove divergent commit logic with segment replication. This change removes divergent commit paths for segrep node-node and remote store. All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file. This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before an engine is opened. This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more NRTReplicationEngineTests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more shard level tests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add test ensuring commits are cleaned up on replicas. Signed-off-by: Marc Handalian <handalm@amazon.com> * Self review. Signed-off-by: Marc Handalian <handalm@amazon.com> * Use refresh level sync before recovery Signed-off-by: Marc Handalian <handalm@amazon.com> * PR feedback. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
brusic
pushed a commit
to brusic/OpenSearch
that referenced
this pull request
Sep 25, 2023
…with remote store. (opensearch-project#9111) * Remove divergent commit logic with segment replication. This change removes divergent commit paths for segrep node-node and remote store. All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file. This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before an engine is opened. This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more NRTReplicationEngineTests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more shard level tests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add test ensuring commits are cleaned up on replicas. Signed-off-by: Marc Handalian <handalm@amazon.com> * Self review. Signed-off-by: Marc Handalian <handalm@amazon.com> * Use refresh level sync before recovery Signed-off-by: Marc Handalian <handalm@amazon.com> * PR feedback. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
shiv0408
pushed a commit
to Gaurav614/OpenSearch
that referenced
this pull request
Apr 25, 2024
…with remote store. (opensearch-project#9111) * Remove divergent commit logic with segment replication. This change removes divergent commit paths for segrep node-node and remote store. All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file. This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before an engine is opened. This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more NRTReplicationEngineTests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add more shard level tests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add test ensuring commits are cleaned up on replicas. Signed-off-by: Marc Handalian <handalm@amazon.com> * Self review. Signed-off-by: Marc Handalian <handalm@amazon.com> * Use refresh level sync before recovery Signed-off-by: Marc Handalian <handalm@amazon.com> * PR feedback. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This change fixes multiple issues around scroll/PIT tests with Segment Replication and remote store. These issues stem from different logic in NRTReplicationEngine around how segments and commit points are preserved between refresh cycles. With node to node we are only performing local commits and preserving the latest on-disk commit but with remote store it was possible for a new incoming commit point to leave a still required commit point available to deletion as its not the "latest commit" on disk.
With this change all replicas with segrep enabled perform local commits when necessary from the incoming SegmentInfos byte[] only and ignore any incoming segments_n from its replication source. This PR also changes the recovery sync with remote store to exclude the segments_n so that only the fetched infos bytes are committed before an engine is opened.
This change also simplifies deletion logic with segment replication to automatically delete when a file is decref'd to 0 to make it easier to reason about when files/cleanup is performed.
Files are Incref'd when they are loaded on to the reader and when committed or when a segmentInfosSnapshot is acquired.
Files are decref'd after a new commit is made, when a reader is closed, or a segmentInfosSnapshot is closed.
Related Issues
Resolves #8850
Resolves #7556
Resolves #8777
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.