Skip to content

Remote store restore optimization #16607

Open
prayascoriolis wants to merge 6 commits intoopensearch-project:mainfrom
prayascoriolis:remote-store-restore-optimization-
Open

Remote store restore optimization #16607
prayascoriolis wants to merge 6 commits intoopensearch-project:mainfrom
prayascoriolis:remote-store-restore-optimization-

Conversation

@prayascoriolis
Copy link
Copy Markdown

Description

Key changes made:

  • Added a check to detect if there is preexisting data in the remote store.
  • Modified the copySegmentFiles call to skip remote upload when data already exists by passing null instead of remoteDirectory.
  • Preserved all other functionality including local file copying and verification.

This solution:

  • Prevents unnecessary uploads when restoring from an interop enabled repository.
  • Maintains data consistency by still performing local copies.
  • Preserves existing recovery state tracking and error handling.
  • Optimizes performance by avoiding redundant network transfers.
  • The change is backward compatible and maintains the existing recovery mechanisms while eliminating the unnecessary upload step when data already exists in the remote store.

Related Issues

Resolves #11044

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for a55ff72: SUCCESS

Map<String, RemoteSegmentStoreDirectory.UploadedSegmentMetadata> uploadedSegments = sourceRemoteDirectory
.getSegmentsUploadedToRemoteStore();
// Checking for existing remote segments in remote
boolean hasPreexistingRemoteData = remoteDirectory != null && !remoteDirectory.getSegmentsUploadedToRemoteStore().isEmpty();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check if the uploaded data represents latest state?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check latest state of uploaded data implemented

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is also possible that the remote directory only has a subset of segments present. Instead of 0 or 1, can we only upload segments that are missing?

…by: Prayas Kumar <prayas.kumar@coriolis.co.in>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 0bdc6af: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 4a10493: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

This PR is stalled because it has been open for 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working good first issue Good for newcomers stalled Issues that have stalled Storage:Durability Issues and PRs related to the durability framework Storage:Remote

Projects

Status: 🏗 In progress

Development

Successfully merging this pull request may close these issues.

[Remote Store] Remove redundant uploads while restoring data from snapshot

2 participants