Skip to content

Batch deletes in remote segment store#20146

Merged
linuxpi merged 3 commits intoopensearch-project:mainfrom
gbbafna:bulk-delete
Dec 29, 2025
Merged

Batch deletes in remote segment store#20146
linuxpi merged 3 commits intoopensearch-project:mainfrom
gbbafna:bulk-delete

Conversation

@gbbafna
Copy link
Copy Markdown
Contributor

@gbbafna gbbafna commented Dec 2, 2025

Description

Currently we delete the files one by one in Remote Segment Store. This is highly inefficient , as for every segment_n file , we end up making 5 to 10 delete calls instead of batching them together .

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • New Features

    • Added batch deletion for removing multiple remote files in one operation; ignores missing files and treats null/empty inputs as no-ops. Cache and metadata updates occur only after successful batch deletion, and logging/error handling is consolidated around the batch operation.
  • Tests

    • Added and updated tests validating batch deletions, null/empty/missing-file handling, error propagation, cache updates, and metadata-deletion behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 2, 2025

Walkthrough

Adds a batch deletion API deleteFiles(List<String>) to RemoteDirectory and updates RemoteSegmentStoreDirectory and tests to perform and validate batched remote deletions, consolidating error handling and updating caches after a single batch delete operation.

Changes

Cohort / File(s) Summary
Batch deletion API
server/src/main/java/org/opensearch/index/store/RemoteDirectory.java
Adds public void deleteFiles(List<String> names) throws IOException which no-ops on null/empty input and delegates to blobContainer.deleteBlobsIgnoringIfNotExists(names), ignoring missing blobs.
Batch deletion integration
server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java
Replaces per-file remote deletions with a single batch deletion (remoteDataDirectory.deleteFiles(filesToDelete)); consolidates error handling into one try-catch, performs cache updates after successful batch deletion, and conditions metadata-file deletion on batch success.
Batch deletion API tests
server/src/test/java/org/opensearch/index/store/RemoteDirectoryTests.java
Adds tests: testDeleteFiles, testDeleteFilesEmptyCollection, testDeleteFilesNullCollection, testDeleteFilesWithNonExistentFiles, testDeleteFilesException to validate batch delete behavior and edge cases.
Batch deletion integration & test updates
server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java, server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryWithPinnedTimestampTests.java, server/src/test/java/org/opensearch/index/store/BaseRemoteSegmentStoreDirectoryTests.java
Replaces per-file deletion assertions with a single deleteFiles batch verification (order-insensitive Set matching via argThat), adds tests for batch deletion success and error handling, and updates fixture metadata contents for certain metadata files.

Sequence Diagram(s)

sequenceDiagram
    participant SegmentDir as RemoteSegmentStoreDirectory
    participant RemoteDir as RemoteDirectory
    participant Blob as BlobContainer

    Note over SegmentDir,RemoteDir: Identify stale remote files -> filesToDelete
    SegmentDir->>RemoteDir: deleteFiles(filesToDelete)
    RemoteDir->>Blob: deleteBlobsIgnoringIfNotExists(filesToDelete)
    alt success
        Blob-->>RemoteDir: success
        RemoteDir-->>SegmentDir: return success
        Note right of SegmentDir: update deletedSegmentFiles\nand segmentsUploadedToRemoteStore\nthen delete remote metadata file
    else IOException / failure
        Blob-->>RemoteDir: throws IOException
        RemoteDir--x SegmentDir: propagate IOException
        Note right of SegmentDir: leave caches & metadata unchanged\nlog failure referencing metadata file
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review error-handling consolidation in RemoteSegmentStoreDirectory.java to ensure cache consistency when batch deletion fails.
  • Verify cache update ordering: deletedSegmentFiles and segmentsUploadedToRemoteStore are only changed after successful batch deletion.
  • Check tests using argThat/Set matching cover all expected deleted file names and edge cases (null/empty lists, partial non-existence, exceptions).
  • Ensure RemoteDirectory.deleteFiles correctly no-ops on null/empty and propagates IOExceptions from the blob container.

Poem

🐇 I hopped through bytes and quiet logs,

Gathered many files in one neat clod.
One gentle nudge, a batch undone,
Caches tidy, work well-run.
Hop — the remote meadow's clear and broad.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: introducing batch deletes in the remote segment store, which is the core objective of this PR.
Description check ✅ Passed The description explains the current inefficiency (one-by-one deletes resulting in 5-10 calls per segment) and the proposed solution (batching), with testing confirmed and checklist completed.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 2, 2025

❌ Gradle check result for a71020a: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java (1)

1023-1034: Consider using a plain boolean instead of AtomicBoolean.

The deletionSuccessful variable is only accessed within a single thread (no concurrent access), so AtomicBoolean is unnecessary overhead. A simple boolean variable would suffice.

-            AtomicBoolean deletionSuccessful = new AtomicBoolean(true);
+            boolean deletionSuccessful = true;
             try {
                 // Batch delete all stale segment files
                 remoteDataDirectory.deleteFiles(filesToDelete);
                 deletedSegmentFiles.addAll(filesToDelete);

                 // Update cache after successful batch deletion
                 for (String file : filesToDelete) {
                     if (!activeSegmentFilesMetadataMap.containsKey(getLocalSegmentFilename(file))) {
                         segmentsUploadedToRemoteStore.remove(getLocalSegmentFilename(file));
                     }
                 }
             } catch (IOException e) {
-                deletionSuccessful.set(false);
+                deletionSuccessful = false;
                 logger.warn(
                     ...
                 );
             }
-            if (deletionSuccessful.get()) {
+            if (deletionSuccessful) {
                 logger.debug("Deleting stale metadata file {} from remote segment store", metadataFile);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f76826c and a71020a.

📒 Files selected for processing (4)
  • server/src/main/java/org/opensearch/index/store/RemoteDirectory.java (1 hunks)
  • server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java (1 hunks)
  • server/src/test/java/org/opensearch/index/store/RemoteDirectoryTests.java (1 hunks)
  • server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java (5 hunks)
🧰 Additional context used
🪛 GitHub Actions: Gradle Precommit
server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java

[error] 1037-1037: Bad usage of logger: Wrong usage of org.apache.logging.log4j.Logger#warn detected by OpenSearchLoggerUsageChecker (likely incorrect arguments).

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: gradle-check
🔇 Additional comments (10)
server/src/main/java/org/opensearch/index/store/RemoteDirectory.java (1)

187-201: LGTM! Clean batch delete implementation.

The new deleteFiles method correctly handles edge cases (null/empty) and delegates to the underlying blob container's batch delete API, maintaining consistency with the existing deleteFile method behavior.

server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java (1)

1016-1021: LGTM! Batch file collection logic is correct.

The filtering logic correctly identifies files to delete by excluding:

  1. Files present in the active segment set
  2. Files already deleted in previous iterations

This ensures safe batch deletion without affecting active segments.

server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java (7)

68-68: LGTM!

Import for argThat matcher added to support order-independent verification of batch delete calls.


960-982: LGTM! Test correctly validates batch deletion semantics.

The test properly uses Set comparison with argThat for order-independent verification of the batch delete call, which is appropriate since the order of files in the delete operation doesn't affect correctness.


991-1011: LGTM!

Test correctly validates that batch deletion respects metadata file locks.


1065-1092: LGTM! Deduplication test correctly validates batch behavior.

The test verifies that:

  1. First batch deletes unique stale files from metadataFilename3
  2. Second batch (for metadataFilename4 with duplicate content) correctly passes an empty list since files were already deleted

1094-1124: LGTM!

Test correctly validates that when batch deletion fails with an IOException, the metadata file is preserved (not deleted), maintaining data consistency.


1126-1152: LGTM!

Test correctly validates that the batch delete operation (using deleteBlobsIgnoringIfNotExists) handles non-existent files gracefully, allowing the metadata file deletion to proceed.


1190-1231: LGTM! Error handling test is thorough.

The test correctly validates:

  1. Batch delete is attempted with correct files
  2. Metadata file is preserved when deletion fails
  3. Cache entries remain intact after failed deletion (correctly using local filenames for verification)
server/src/test/java/org/opensearch/index/store/RemoteDirectoryTests.java (1)

201-255: LGTM! Comprehensive test coverage for the new batch delete method.

The five new tests provide thorough coverage:

  • Normal operation with multiple files
  • Empty collection handling (no-op)
  • Null collection handling (no-op)
  • Mixed existent/non-existent files
  • Exception propagation

Tests follow the existing patterns in the file and use appropriate verification.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 2, 2025

❌ Gradle check result for a71020a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 2, 2025

❌ Gradle check result for a71020a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ajaymovva
Copy link
Copy Markdown
Contributor

Few nit picks from above comments. Rest LGTM.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 3, 2025

❌ Gradle check result for a71020a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java (1)

1016-1044: Batch delete logic is sound; consider minor robustness and readability tweaks

The overall batching logic—filtering via activeSegmentRemoteFilenames / deletedSegmentFiles and updating segmentsUploadedToRemoteStore using getLocalSegmentFilename—looks correct and preserves the “never delete active segments, don’t re-delete” semantics.

Two small, non-blocking improvements:

  • filesToDelete may often be empty (e.g., all segments already active or previously deleted). Short‑circuiting avoids an unnecessary remote call and protects you if a deleteFiles implementation doesn’t like empty lists:
-            // Collect all files to delete for this metadata file
-            List<String> filesToDelete = staleSegmentRemoteFilenames.stream()
+            // Collect all files to delete for this metadata file
+            List<String> filesToDelete = staleSegmentRemoteFilenames.stream()
                 .filter(file -> activeSegmentRemoteFilenames.contains(file) == false)
                 .filter(file -> deletedSegmentFiles.contains(file) == false)
                 .collect(Collectors.toList());
 
-            AtomicBoolean deletionSuccessful = new AtomicBoolean(true);
-            try {
+            if (filesToDelete.isEmpty()) {
+                continue;
+            }
+
+            boolean deletionSuccessful = true;
+            try {
                 // Batch delete all stale segment files
                 remoteDataDirectory.deleteFiles(filesToDelete);
                 deletedSegmentFiles.addAll(filesToDelete);
@@
-            } catch (IOException e) {
-                deletionSuccessful.set(false);
+            } catch (IOException e) {
+                deletionSuccessful = false;
                 logger.warn(
                     () -> new ParameterizedMessage(
                         "Exception while deleting segment files corresponding to metadata file {}. Deletion will be re-tried",
                         metadataFile
                     ),
                     e
                 );
             }
  • Given this is GC code, it’s probably fine, but it’s worth double‑checking the contract of RemoteDirectory.deleteFiles(List<String>): does it try all names and only throw on error, or can it throw after partially succeeding? Aligning that contract with your previous per‑file behavior (best‑effort delete all, and only suppress metadata deletion on any failure) will avoid surprises around deletedSegmentFiles and cache state if partial failures occur.

Please confirm that deleteFiles semantics match what deleteStaleSegments expects (either all‑or‑nothing or best‑effort without throwing on partial success).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a71020a and 06630ff.

📒 Files selected for processing (2)
  • server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java (1 hunks)
  • server/src/test/java/org/opensearch/index/store/BaseRemoteSegmentStoreDirectoryTests.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: Analyze (java)
  • GitHub Check: detect-breaking-change
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, windows-2025, true)
🔇 Additional comments (1)
server/src/test/java/org/opensearch/index/store/BaseRemoteSegmentStoreDirectoryTests.java (1)

112-119: Distinct dummy metadata per file looks good

Switching metadataFilename2 and metadataFilename3 to getDummyMetadata("_0", 2) and getDummyMetadata("_0", 3) respectively gives each metadata file unique contents, which should strengthen tests that reason about multiple metadata blobs. Map construction and downstream stubbing via getBlobStream remain consistent and type-safe.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 06630ff: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 06630ff: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java (1)

1171-1172: Remove unused variable.

The cacheBefore variable is declared but never used in this test. Consider removing it to improve code clarity.

Apply this diff:

-        // Get the cache after deletion
-        Map<String, RemoteSegmentStoreDirectory.UploadedSegmentMetadata> cacheBefore = remoteSegmentStoreDirectory
-            .getSegmentsUploadedToRemoteStore();
-
         // Execute deletion
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06630ff and 2974ed2.

📒 Files selected for processing (2)
  • server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java (5 hunks)
  • server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryWithPinnedTimestampTests.java (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: detect-breaking-change
🔇 Additional comments (9)
server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryWithPinnedTimestampTests.java (3)

27-27: LGTM!

The new imports support batch deletion verification with order-independent collection comparison.

Also applies to: 30-30, 39-39


199-218: LGTM!

The test correctly verifies batch deletion behavior with order-independent matching. The use of Set and argThat with HashSet comparison appropriately handles the non-deterministic order of batch deletions.


225-247: LGTM!

The new test correctly validates batch deletion behavior with order-independent verification. The test covers both the batch deleteFiles call and metadata file cleanup.

server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java (6)

68-68: LGTM!

The argThat import supports order-independent batch deletion verification.


960-979: LGTM!

The test correctly verifies batch deletion with order-independent matching. The expected files are collected from metadataFilename3, which is the file being deleted when keeping the 2 newest metadata files.


992-1009: LGTM!

The test correctly handles the locked metadata file scenario, verifying that only unlocked stale files are deleted in the batch operation.


1066-1089: LGTM!

The test correctly validates deduplication behavior in batch deletions, verifying that duplicate segment files are only deleted once and the second batch is empty.


1099-1124: LGTM!

The test correctly validates error handling, ensuring metadata files are not deleted when batch segment deletion fails, maintaining consistency between data and metadata.


1131-1152: LGTM!

The test correctly validates that batch deletion handles missing files gracefully and still proceeds with metadata file cleanup.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 2974ed2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 2974ed2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 2974ed2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 8f3bb70: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 8f3bb70: SUCCESS

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.26%. Comparing base (930ae74) to head (8f3bb70).
⚠️ Report is 24 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20146      +/-   ##
============================================
+ Coverage     73.20%   73.26%   +0.05%     
- Complexity    71745    71787      +42     
============================================
  Files          5795     5795              
  Lines        328304   328302       -2     
  Branches      47283    47280       -3     
============================================
+ Hits         240334   240528     +194     
+ Misses        68663    68476     -187     
+ Partials      19307    19298       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@linuxpi linuxpi merged commit d6f6cfa into opensearch-project:main Dec 29, 2025
35 of 37 checks passed
mohit10011999 pushed a commit to mohit10011999/OpenSearch that referenced this pull request Dec 30, 2025
* Batch deletes in remote segment store

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

* address PR comments

* fix tests

---------

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
mohit10011999 pushed a commit to mohit10011999/OpenSearch that referenced this pull request Dec 30, 2025
* Batch deletes in remote segment store

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

* address PR comments

* fix tests

---------

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Adiiigo pushed a commit to Adiiigo/OpenSearch that referenced this pull request Feb 20, 2026
* Batch deletes in remote segment store

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

* address PR comments

* fix tests

---------

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants