Batch deletes in remote segment store#20146
Conversation
WalkthroughAdds a batch deletion API Changes
Sequence Diagram(s)sequenceDiagram
participant SegmentDir as RemoteSegmentStoreDirectory
participant RemoteDir as RemoteDirectory
participant Blob as BlobContainer
Note over SegmentDir,RemoteDir: Identify stale remote files -> filesToDelete
SegmentDir->>RemoteDir: deleteFiles(filesToDelete)
RemoteDir->>Blob: deleteBlobsIgnoringIfNotExists(filesToDelete)
alt success
Blob-->>RemoteDir: success
RemoteDir-->>SegmentDir: return success
Note right of SegmentDir: update deletedSegmentFiles\nand segmentsUploadedToRemoteStore\nthen delete remote metadata file
else IOException / failure
Blob-->>RemoteDir: throws IOException
RemoteDir--x SegmentDir: propagate IOException
Note right of SegmentDir: leave caches & metadata unchanged\nlog failure referencing metadata file
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
❌ Gradle check result for a71020a: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java (1)
1023-1034: Consider using a plain boolean instead of AtomicBoolean.The
deletionSuccessfulvariable is only accessed within a single thread (no concurrent access), soAtomicBooleanis unnecessary overhead. A simplebooleanvariable would suffice.- AtomicBoolean deletionSuccessful = new AtomicBoolean(true); + boolean deletionSuccessful = true; try { // Batch delete all stale segment files remoteDataDirectory.deleteFiles(filesToDelete); deletedSegmentFiles.addAll(filesToDelete); // Update cache after successful batch deletion for (String file : filesToDelete) { if (!activeSegmentFilesMetadataMap.containsKey(getLocalSegmentFilename(file))) { segmentsUploadedToRemoteStore.remove(getLocalSegmentFilename(file)); } } } catch (IOException e) { - deletionSuccessful.set(false); + deletionSuccessful = false; logger.warn( ... ); } - if (deletionSuccessful.get()) { + if (deletionSuccessful) { logger.debug("Deleting stale metadata file {} from remote segment store", metadataFile);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
server/src/main/java/org/opensearch/index/store/RemoteDirectory.java(1 hunks)server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java(1 hunks)server/src/test/java/org/opensearch/index/store/RemoteDirectoryTests.java(1 hunks)server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java(5 hunks)
🧰 Additional context used
🪛 GitHub Actions: Gradle Precommit
server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java
[error] 1037-1037: Bad usage of logger: Wrong usage of org.apache.logging.log4j.Logger#warn detected by OpenSearchLoggerUsageChecker (likely incorrect arguments).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (10)
server/src/main/java/org/opensearch/index/store/RemoteDirectory.java (1)
187-201: LGTM! Clean batch delete implementation.The new
deleteFilesmethod correctly handles edge cases (null/empty) and delegates to the underlying blob container's batch delete API, maintaining consistency with the existingdeleteFilemethod behavior.server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java (1)
1016-1021: LGTM! Batch file collection logic is correct.The filtering logic correctly identifies files to delete by excluding:
- Files present in the active segment set
- Files already deleted in previous iterations
This ensures safe batch deletion without affecting active segments.
server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java (7)
68-68: LGTM!Import for
argThatmatcher added to support order-independent verification of batch delete calls.
960-982: LGTM! Test correctly validates batch deletion semantics.The test properly uses
Setcomparison withargThatfor order-independent verification of the batch delete call, which is appropriate since the order of files in the delete operation doesn't affect correctness.
991-1011: LGTM!Test correctly validates that batch deletion respects metadata file locks.
1065-1092: LGTM! Deduplication test correctly validates batch behavior.The test verifies that:
- First batch deletes unique stale files from
metadataFilename3- Second batch (for
metadataFilename4with duplicate content) correctly passes an empty list since files were already deleted
1094-1124: LGTM!Test correctly validates that when batch deletion fails with an IOException, the metadata file is preserved (not deleted), maintaining data consistency.
1126-1152: LGTM!Test correctly validates that the batch delete operation (using
deleteBlobsIgnoringIfNotExists) handles non-existent files gracefully, allowing the metadata file deletion to proceed.
1190-1231: LGTM! Error handling test is thorough.The test correctly validates:
- Batch delete is attempted with correct files
- Metadata file is preserved when deletion fails
- Cache entries remain intact after failed deletion (correctly using local filenames for verification)
server/src/test/java/org/opensearch/index/store/RemoteDirectoryTests.java (1)
201-255: LGTM! Comprehensive test coverage for the new batch delete method.The five new tests provide thorough coverage:
- Normal operation with multiple files
- Empty collection handling (no-op)
- Null collection handling (no-op)
- Mixed existent/non-existent files
- Exception propagation
Tests follow the existing patterns in the file and use appropriate verification.
server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java
Show resolved
Hide resolved
server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java
Show resolved
Hide resolved
|
❌ Gradle check result for a71020a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for a71020a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Few nit picks from above comments. Rest LGTM. |
|
❌ Gradle check result for a71020a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java (1)
1016-1044: Batch delete logic is sound; consider minor robustness and readability tweaksThe overall batching logic—filtering via
activeSegmentRemoteFilenames/deletedSegmentFilesand updatingsegmentsUploadedToRemoteStoreusinggetLocalSegmentFilename—looks correct and preserves the “never delete active segments, don’t re-delete” semantics.Two small, non-blocking improvements:
filesToDeletemay often be empty (e.g., all segments already active or previously deleted). Short‑circuiting avoids an unnecessary remote call and protects you if adeleteFilesimplementation doesn’t like empty lists:- // Collect all files to delete for this metadata file - List<String> filesToDelete = staleSegmentRemoteFilenames.stream() + // Collect all files to delete for this metadata file + List<String> filesToDelete = staleSegmentRemoteFilenames.stream() .filter(file -> activeSegmentRemoteFilenames.contains(file) == false) .filter(file -> deletedSegmentFiles.contains(file) == false) .collect(Collectors.toList()); - AtomicBoolean deletionSuccessful = new AtomicBoolean(true); - try { + if (filesToDelete.isEmpty()) { + continue; + } + + boolean deletionSuccessful = true; + try { // Batch delete all stale segment files remoteDataDirectory.deleteFiles(filesToDelete); deletedSegmentFiles.addAll(filesToDelete); @@ - } catch (IOException e) { - deletionSuccessful.set(false); + } catch (IOException e) { + deletionSuccessful = false; logger.warn( () -> new ParameterizedMessage( "Exception while deleting segment files corresponding to metadata file {}. Deletion will be re-tried", metadataFile ), e ); }
- Given this is GC code, it’s probably fine, but it’s worth double‑checking the contract of
RemoteDirectory.deleteFiles(List<String>): does it try all names and only throw on error, or can it throw after partially succeeding? Aligning that contract with your previous per‑file behavior (best‑effort delete all, and only suppress metadata deletion on any failure) will avoid surprises arounddeletedSegmentFilesand cache state if partial failures occur.Please confirm that
deleteFilessemantics match whatdeleteStaleSegmentsexpects (either all‑or‑nothing or best‑effort without throwing on partial success).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java(1 hunks)server/src/test/java/org/opensearch/index/store/BaseRemoteSegmentStoreDirectoryTests.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: Analyze (java)
- GitHub Check: detect-breaking-change
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, windows-2025, true)
🔇 Additional comments (1)
server/src/test/java/org/opensearch/index/store/BaseRemoteSegmentStoreDirectoryTests.java (1)
112-119: Distinct dummy metadata per file looks goodSwitching
metadataFilename2andmetadataFilename3togetDummyMetadata("_0", 2)andgetDummyMetadata("_0", 3)respectively gives each metadata file unique contents, which should strengthen tests that reason about multiple metadata blobs. Map construction and downstream stubbing viagetBlobStreamremain consistent and type-safe.
|
❌ Gradle check result for 06630ff: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 06630ff: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java (1)
1171-1172: Remove unused variable.The
cacheBeforevariable is declared but never used in this test. Consider removing it to improve code clarity.Apply this diff:
- // Get the cache after deletion - Map<String, RemoteSegmentStoreDirectory.UploadedSegmentMetadata> cacheBefore = remoteSegmentStoreDirectory - .getSegmentsUploadedToRemoteStore(); - // Execute deletion
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java(5 hunks)server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryWithPinnedTimestampTests.java(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: Analyze (java)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: detect-breaking-change
🔇 Additional comments (9)
server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryWithPinnedTimestampTests.java (3)
27-27: LGTM!The new imports support batch deletion verification with order-independent collection comparison.
Also applies to: 30-30, 39-39
199-218: LGTM!The test correctly verifies batch deletion behavior with order-independent matching. The use of
SetandargThatwithHashSetcomparison appropriately handles the non-deterministic order of batch deletions.
225-247: LGTM!The new test correctly validates batch deletion behavior with order-independent verification. The test covers both the batch
deleteFilescall and metadata file cleanup.server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java (6)
68-68: LGTM!The
argThatimport supports order-independent batch deletion verification.
960-979: LGTM!The test correctly verifies batch deletion with order-independent matching. The expected files are collected from
metadataFilename3, which is the file being deleted when keeping the 2 newest metadata files.
992-1009: LGTM!The test correctly handles the locked metadata file scenario, verifying that only unlocked stale files are deleted in the batch operation.
1066-1089: LGTM!The test correctly validates deduplication behavior in batch deletions, verifying that duplicate segment files are only deleted once and the second batch is empty.
1099-1124: LGTM!The test correctly validates error handling, ensuring metadata files are not deleted when batch segment deletion fails, maintaining consistency between data and metadata.
1131-1152: LGTM!The test correctly validates that batch deletion handles missing files gracefully and still proceeds with metadata file cleanup.
server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java
Show resolved
Hide resolved
server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java
Show resolved
Hide resolved
|
❌ Gradle check result for 2974ed2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 2974ed2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 2974ed2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
|
❌ Gradle check result for 8f3bb70: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #20146 +/- ##
============================================
+ Coverage 73.20% 73.26% +0.05%
- Complexity 71745 71787 +42
============================================
Files 5795 5795
Lines 328304 328302 -2
Branches 47283 47280 -3
============================================
+ Hits 240334 240528 +194
+ Misses 68663 68476 -187
+ Partials 19307 19298 -9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
* Batch deletes in remote segment store Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * address PR comments * fix tests --------- Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
* Batch deletes in remote segment store Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * address PR comments * fix tests --------- Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
* Batch deletes in remote segment store Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * address PR comments * fix tests --------- Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Description
Currently we delete the files one by one in Remote Segment Store. This is highly inefficient , as for every
segment_nfile , we end up making 5 to 10 delete calls instead of batching them together .Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.