Add support for incremental download of translog files#16204
Add support for incremental download of translog files#16204rawahars wants to merge 2 commits intoopensearch-project:mainfrom
Conversation
|
❌ Gradle check result for cd08d89: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
cd08d89 to
f17a378
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #16204 +/- ##
============================================
- Coverage 72.10% 72.08% -0.03%
+ Complexity 64862 64858 -4
============================================
Files 5307 5308 +1
Lines 302606 302720 +114
Branches 43717 43734 +17
============================================
+ Hits 218208 218228 +20
- Misses 66541 66542 +1
- Partials 17857 17950 +93 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f17a378 to
b8f56ea
Compare
|
❌ Gradle check result for b8f56ea: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
833b508 to
535bc28
Compare
|
❌ Gradle check result for 535bc28: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
535bc28 to
ce18c9b
Compare
|
❌ Gradle check result for ce18c9b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/index/translog/transfer/TranslogTransferManager.java
Show resolved
Hide resolved
...er/src/main/java/org/opensearch/index/translog/transfer/TranslogTransferMetadataHandler.java
Outdated
Show resolved
Hide resolved
| * | ||
| * - Magic number (Uint32): A constant value that identifies the start of the footer. | ||
| * - Algorithm ID (Uint32): The identifier of the checksum algorithm used. Currently, this is always 0. | ||
| * - Checksum (Uint64): The checksum of the entire translog data, calculated using the specified algorithm. | ||
| */ | ||
| public class TranslogFooter { |
There was a problem hiding this comment.
Curious will Translog Footer only apply to remote translog?
There was a problem hiding this comment.
No, the Translog footer will be applied to both Remote and Local translog. The footer will be applied at the time of generation rollover, when the translog file becomes immutable. By instantiating TranslogCheckedContainer for both Local and Remote translog, we will now have the checksum pre-calculated for both, which will be written into footer.
For remote translog, upon each sync, the files are transferred to remote and the generation is rolled over.
For local translog, the sync can happen independently of generation rollover. So, we will keep writing to file until the rollover happens.
ce18c9b to
6ea2adb
Compare
|
❌ Gradle check result for 6ea2adb: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| translogCheckedContainer = new TranslogCheckedContainer(byteArrayOutputStream.toByteArray()); | ||
| } | ||
|
|
||
| // Enable translogCheckedContainer for remote as well as local translog. |
There was a problem hiding this comment.
Why are we enabling translogCheckedContainer for local translogs?
There was a problem hiding this comment.
As per discussion with @Bukhtawar and @sachinpkale, for ensuring consistent view of the world, we are enabling footer for both local and remote translogs.
There was a problem hiding this comment.
I am assuming there would be low/minimal overhead of checksum for the local translog? Can we run a perf benchmark?
There was a problem hiding this comment.
Yes, I am running the OSB (opensearch benchmarks) for the same to understand the impact. But from code workflow POV, it seems to be a low overhead operation.
| synchronized (syncLock) { | ||
| try (ReleasableLock toClose = writeLock.acquire()) { | ||
| synchronized (this) { | ||
| Long translogWithoutFooterChecksum = null; |
There was a problem hiding this comment.
nit: we don't need to explicitly initialise Long with null value
There was a problem hiding this comment.
I am more comfortable with overt initialisation of the variable prior and post the workflow. It find that it helps ensure that code is cleaner.
If you feel strongly about it or implicit initialisation leads to some performance benefits, we can change the same.
| (translogCheckedContainer != null) ? translogCheckedContainer.getChecksum() : null | ||
| translogWithoutFooterChecksum, | ||
| translogWithFooterChecksum, | ||
| true |
There was a problem hiding this comment.
Why is it, that we are always passing hasFooter to true here? There can be cases when using older header and therefore, no footer right?
There was a problem hiding this comment.
This particular instance of TranslogReader creation is from TranslogWriter.closeIntoReader. This method is called only by current writers when they are rolling generation and creating readers out of them. Now, once this PR is merged, all the writers will have headers and therefore, whenever they try to close into reader, they will indeed have header.
Note that older header will never go through this workflow.
There was a problem hiding this comment.
ok. From what i understand by your comment. This below particular else code block looks redundant?
else {
// If we reach here then it means we are using older header and therefore, no footer.
// So, both translogWithoutFooterChecksum and translogWithFooterChecksum are same.
translogWithoutFooterChecksum = (translogCheckedContainer != null)
? translogCheckedContainer.getChecksum()
: null;
translogWithFooterChecksum = translogWithoutFooterChecksum;
}
In that case can we think of removing it?
There was a problem hiding this comment.
This is there for safety purpose. Imagine a scenario wherein we have an open translog partially written when the OpenSearch process is upgraded, especially since it is under customer control. Now, when we will finally close this writer, then we will omit the footer in this case.
Now, for remote translog, we do not have a problem as the expectation is that any data which was in memory and not synced with remote will be lost. So the new footer will start anew.
But for local translog, the older partial translog will be present on disk which is of older version. So we still need to close it but not write the footer.
| // When we open a reader to Translog from a path, we want to fetch the checksum | ||
| // as that would be needed later on while creating the metadata map for | ||
| // generation to checksum. | ||
| Long translogChecksum = null; |
There was a problem hiding this comment.
| Long translogChecksum = null; | |
| Long translogChecksum; |
There was a problem hiding this comment.
I would prefer explicit initialisation of the variable especially since we are initialising hasFooter with false. It ensures that code is consistent and overt. If you feel strongly about it, we can make the change.
| Long translogChecksum = null; | ||
| boolean hasFooter = false; | ||
| try { | ||
| translogChecksum = TranslogFooter.readChecksum(path); |
There was a problem hiding this comment.
This is a bit confusing. As readChecksum returns null value of checksum for an older version file that doesn't have footer. Here we are storing this translogChecksum as null and setting hasFooter to true even though the file doesn't have footer.
There was a problem hiding this comment.
Thanks for the suggestion. I have refactored this block to make it better streamlined. Now we will directly check-
boolean hasFooter = translogChecksum != null;
| public Long getTranslogWithFooterChecksum() { | ||
| return translogWithFooterChecksum; | ||
| } | ||
|
|
There was a problem hiding this comment.
Can we rename it better, like fullChecksum and the other contentChecksum
| * @return the checksum value from the translog footer, or `null` if the footer is not present | ||
| * @throws IOException if an I/O error occurs while reading the footer | ||
| */ | ||
| static Long readChecksum(Path path) throws IOException { |
There was a problem hiding this comment.
Would prefer non-static functions? Are they getting invoked from a static context?
There was a problem hiding this comment.
Yes, this method is invoked from static contexts in all instances.
downloadOnce in RemoteFSTranslog is one instance and open in TranslogReader is another one.
| byte[] footer = TranslogFooter.write( | ||
| channel, | ||
| translogCheckedContainer.getChecksum(), | ||
| !Boolean.TRUE.equals(remoteTranslogEnabled) |
There was a problem hiding this comment.
Doubt: as we have already written the translog to disk, after that we are writing footer to file. Why is it that we are forcing a sync again only in case of when remoteTranslogEnabled is false.?
There was a problem hiding this comment.
So presently Remote translog does not flush the changes to disk and only keeps it in memory. During sync workflow, it would write the data to file channel in memory and then during rollover, we will upload it directly to remote store.
On the other hand, for local translog, we are flushing the changes to disk on each sync.
Now, in TranslogFooter.write method, we are doing the same operation- For local translog, we flush to disk and for remote translog, we write the data to file channel without flushing to disk.
Presently, the download workflow for remote backed storage works in a manner which causes the download for same translog files multiple times, each time deleting all the older files before downloading them again. This causes significant wasted network bandwidth, along with the time taken for the shard to become active. This change adds support for downloading the translog files incrementally and omitting the same if they are present locally. Signed-off-by: Harsh Rawat <rawahars@amazon.com>
This commit adds the unit tests applicable for the changes made to support incremental download of translog files. Primarily, the unit tests cover the changes in- - TranslogFooter to write and read the footer of Translog - RemoteFSTranslog to skip the download of locally present translog - TranslogWriter to create reader with checksum upon close - TranslogReader closeIntoReader functionality Signed-off-by: Harsh Rawat <rawahars@amazon.com>
6ea2adb to
1211703
Compare
|
Rebased over mainline and addressed the comments which needed any refactoring. |
|
❕ Gradle check result for 1211703: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
|
This PR is stalled because it has been open for 30 days with no activity. |
|
This PR is stalled because it has been open for 30 days with no activity. |
|
This PR is stalled because it has been open for 30 days with no activity. |
Description
Presently, the download workflow for remote backed storage works in a manner that causes the download of the same translog files multiple times, each time deleting all the older files before downloading them again. This causes significant wasted network bandwidth, along with the time taken for the shard to become active.
This change adds support for downloading the translog files incrementally and omitting the same if they are present locally.
Implementation
The key changes include-
Incremental Download Logic in RemoteFsTranslog:
Translog Footer Handling:
Generation-to-Checksum Mapping in TranslogTransferMetadata:
Testing
Manual Testing
Manually tested failover and restore workflow on a 3-node EC2 cluster.
Also, tested backward compatibility wherein initial cluster was created and indexed with older OpenSearch process. It was then switched to incremental download.
Newly added tests
RemoteFsTranslogTestsclass has been updated to include new test cases for the incremental download logic:testIncrementalDownloadWithMatchingChecksumtests the scenario where the local and remote translog files have the same checksum, and the download is skipped.testIncrementalDownloadWithDifferentChecksumtests the scenario where the local and remote translog files have different checksums, and only the missing generation is downloaded.TranslogFooterTestsclass has been added to verify the functionality of theTranslogFooterclass, including writing and reading the footer.TranslogTransferMetadataHandlerTestsclass has been updated to include test cases for handling the generation-to-checksum map in theTranslogTransferMetadata.Related Issues
One of the optimisations for #15277
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.