Skip to content

[RW Separation] Support scale down without search replica#20939

Merged
prudhvigodithi merged 5 commits into
opensearch-project:mainfrom
guojialiang92:dev/support_scale_down_without_search_replica
Apr 2, 2026
Merged

[RW Separation] Support scale down without search replica#20939
prudhvigodithi merged 5 commits into
opensearch-project:mainfrom
guojialiang92:dev/support_scale_down_without_search_replica

Conversation

@guojialiang92
Copy link
Copy Markdown
Contributor

Description

The main changes in the PR are as follows

  • Remove the restriction that a search-only replica must exist when scaling down.
  • In the scaled down state, only two states, yellow and green, should be retained. The new judgment rule is:
    • activeShards < totalShards, status is yellow.
    • activeShards == totalShards, status is green.
  • Add IT test.

Related Issues

Resolves #[20938]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
@guojialiang92 guojialiang92 requested a review from a team as a code owner March 20, 2026 09:05
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 20, 2026

PR Reviewer Guide 🔍

(Review updated until commit 0de1581)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Remove search replica requirement for scale down validation

Relevant files:

  • server/src/main/java/org/opensearch/action/admin/indices/scale/searchonly/ScaleIndexOperationValidator.java
  • server/src/test/java/org/opensearch/action/admin/indices/scale/searchonly/ScaleIndexOperationValidatorTests.java

Sub-PR theme: Fix cluster health status for search-only mode without replicas

Relevant files:

  • server/src/main/java/org/opensearch/cluster/health/ClusterShardHealth.java
  • server/src/test/java/org/opensearch/cluster/health/ClusterShardHealthTests.java

Sub-PR theme: Add integration test for scale down without search replicas

Relevant files:

  • server/src/internalClusterTest/java/org/opensearch/action/admin/indices/scale/searchonly/ScaleIndexIT.java

⚡ Recommended focus areas for review

Weak Test Assertion

In the testFullLifecycle method, when searchOnlyReplica == 0, the test catches an exception and asserts the message contains "all shards failed", but if no exception is thrown, the test silently passes without any assertion. This means the test could pass even if the behavior is incorrect. The catch block should be restructured to ensure the expected exception is actually thrown.

    try {
        client().prepareSearch(TEST_INDEX).setSize(0).get();
    } catch (Exception e) {
        assertTrue(e.getMessage().contains("all shards failed"));
    }
}
Edge Case Missing

The new test asserts GREEN when activeShards == 0 and totalShards == 0. While this is technically correct per the new logic (activeShards < totalShards is false), it may represent an unusual edge case (no shards at all) that deserves a comment. More importantly, there is no test for the case where activeShards > 0 and activeShards < totalShards (YELLOW with some active search replicas), which is the primary new scenario this PR enables.

assertEquals(ClusterHealthStatus.YELLOW, ClusterShardHealth.getShardHealth(null, 0, 1, searchOnlyMetadata));
assertEquals(ClusterHealthStatus.GREEN, ClusterShardHealth.getShardHealth(null, 0, 0, searchOnlyMetadata));
Behavior Change

Previously, when isSearchOnlyClusterBlockEnabled is true and activeShards == 0, the status was RED. Now it returns GREEN (when totalShards == 0) or YELLOW (when totalShards > 0). This means a scaled-down index with zero active search replicas will no longer report RED health. This is an intentional change per the PR description, but it should be validated that downstream consumers (alerting, monitoring) handle this correctly, and that the case of activeShards == 0, totalShards > 0 returning YELLOW is acceptable rather than RED.

return (activeShards < totalShards) ? ClusterHealthStatus.YELLOW : ClusterHealthStatus.GREEN;

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 20, 2026

PR Code Suggestions ✨

Latest suggestions up to 0de1581

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Ensure exception is always thrown in assertion

The try/catch block inside assertBusy silently swallows the case where no exception
is thrown, meaning the assertion would pass even if search nodes unexpectedly serve
results. The test should assert that an exception is always thrown when there are no
search replicas, not optionally catch one.

server/src/internalClusterTest/java/org/opensearch/action/admin/indices/scale/searchonly/ScaleIndexIT.java [89-95]

 } else {
-    try {
-        client().prepareSearch(TEST_INDEX).setSize(0).get();
-    } catch (Exception e) {
-        assertTrue(e.getMessage().contains("all shards failed"));
-    }
+    Exception ex = expectThrows(Exception.class, () -> client().prepareSearch(TEST_INDEX).setSize(0).get());
+    assertTrue(ex.getMessage().contains("all shards failed"));
 }
Suggestion importance[1-10]: 6

__

Why: The current try/catch block silently passes if no exception is thrown, making the test unreliable. Using expectThrows ensures the exception is always required, making the test more robust and accurate.

Low
General
Clarify health status comparison logic

When isSearchOnlyClusterBlockEnabled is true and totalShards is 0 (no search
replicas configured), the condition activeShards < totalShards evaluates to false (0
< 0), returning GREEN. This is correct for the no-search-replica case, but if
activeShards is somehow greater than totalShards, it would also return GREEN
unexpectedly. Consider also verifying that activeShards == totalShards explicitly to
avoid edge cases.

server/src/main/java/org/opensearch/cluster/health/ClusterShardHealth.java [267-273]

 if (primaryRouting == null) {
     if (isSearchOnlyClusterBlockEnabled) {
-        return (activeShards < totalShards) ? ClusterHealthStatus.YELLOW : ClusterHealthStatus.GREEN;
+        return (activeShards >= totalShards) ? ClusterHealthStatus.GREEN : ClusterHealthStatus.YELLOW;
     } else {
         return ClusterHealthStatus.RED;
     }
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion changes activeShards < totalShards to activeShards >= totalShards for the GREEN condition, which is logically equivalent but the edge case concern about activeShards > totalShards is unlikely in practice. The improved_code is functionally equivalent to the existing code, offering only marginal clarity improvement.

Low

Previous suggestions

Suggestions up to commit 64c4a8d
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix silent swallowing of expected exception in test

The try/catch block inside assertBusy silently swallows the exception if no
exception is thrown, meaning the assertion would pass even if search nodes
unexpectedly serve results. The test should assert that an exception is actually
thrown, not just that if one is thrown it contains a certain message.

server/src/internalClusterTest/java/org/opensearch/action/admin/indices/scale/searchonly/ScaleIndexIT.java [90-95]

 } else {
-    try {
-        client().prepareSearch(TEST_INDEX).setSize(0).get();
-    } catch (Exception e) {
-        assertTrue(e.getMessage().contains("all shards failed"));
-    }
+    Exception ex = expectThrows(Exception.class, () -> client().prepareSearch(TEST_INDEX).setSize(0).get());
+    assertTrue(ex.getMessage().contains("all shards failed"));
 }
Suggestion importance[1-10]: 7

__

Why: The current try/catch block silently passes if no exception is thrown, making the test ineffective. Using expectThrows ensures the test actually fails when the expected exception is not thrown, which is a meaningful correctness improvement.

Medium
General
Clarify health status for zero-shard search-only index

When isSearchOnlyClusterBlockEnabled is true and both activeShards and totalShards
are 0 (no search replicas configured), the condition activeShards < totalShards
evaluates to false, returning GREEN. This is semantically correct for the
no-search-replica case, but the test confirms GREEN for (null, 0, 0,
searchOnlyMetadata). Verify that returning GREEN when there are zero total shards
and zero active shards is the intended behavior, as it may mask a misconfigured
index.

server/src/main/java/org/opensearch/cluster/health/ClusterShardHealth.java [267-273]

 if (primaryRouting == null) {
     if (isSearchOnlyClusterBlockEnabled) {
+        if (totalShards == 0) {
+            return ClusterHealthStatus.GREEN;
+        }
         return (activeShards < totalShards) ? ClusterHealthStatus.YELLOW : ClusterHealthStatus.GREEN;
     } else {
         return ClusterHealthStatus.RED;
     }
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion asks to verify intended behavior rather than fixing a clear bug. The improved_code adds an explicit totalShards == 0 check that produces the same result as the existing activeShards < totalShards condition (both return GREEN), so the functional change is minimal and the suggestion is more of a documentation/clarity concern.

Low
Suggestions up to commit 2a68185
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix silent swallowing of expected exception in test

The try/catch block inside assertBusy silently swallows the exception if no
exception is thrown, meaning the assertion never actually validates the expected
failure. If the search succeeds unexpectedly, the test will pass without detecting
the problem. The assertion should be placed outside the try block or use
expectThrows to properly validate the expected failure.

server/src/internalClusterTest/java/org/opensearch/action/admin/indices/scale/searchonly/ScaleIndexIT.java [90-95]

 } else {
-    try {
-        client().prepareSearch(TEST_INDEX).setSize(0).get();
-    } catch (Exception e) {
-        assertTrue(e.getMessage().contains("all shards failed"));
-    }
+    Exception ex = expectThrows(Exception.class, () -> client().prepareSearch(TEST_INDEX).setSize(0).get());
+    assertTrue(ex.getMessage().contains("all shards failed"));
 }
Suggestion importance[1-10]: 7

__

Why: The try/catch block inside assertBusy silently swallows the exception if no exception is thrown, meaning the test would pass even if the search succeeds unexpectedly. Using expectThrows is a valid improvement that ensures the exception is actually thrown and validated.

Medium
Prevent GREEN status when no shards are active

When primaryRouting is null and isSearchOnlyClusterBlockEnabled is true, the new
logic returns GREEN when activeShards == totalShards == 0. This means an index with
no active shards at all is reported as GREEN, which is misleading. When activeShards
== 0 and totalShards == 0, it may be acceptable, but if totalShards > 0 and
activeShards == 0, the status should still be RED or at least YELLOW.

server/src/main/java/org/opensearch/cluster/health/ClusterShardHealth.java [267-273]

 if (primaryRouting == null) {
     if (isSearchOnlyClusterBlockEnabled) {
+        if (activeShards == 0 && totalShards > 0) {
+            return ClusterHealthStatus.RED;
+        }
         return (activeShards < totalShards) ? ClusterHealthStatus.YELLOW : ClusterHealthStatus.GREEN;
     } else {
         return ClusterHealthStatus.RED;
     }
 }
Suggestion importance[1-10]: 5

__

Why: The new logic returns GREEN when activeShards == totalShards == 0, which could be misleading. However, the test in ClusterShardHealthTests.java explicitly asserts GREEN for (null, 0, 0, searchOnlyMetadata), indicating this behavior is intentional for the search-only mode case where no search replicas are configured.

Low
Suggestions up to commit f44aa13
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix silent swallowing of missing exception in test

The try/catch block inside assertBusy silently swallows the case where no exception
is thrown, meaning the assertion would pass even if search nodes incorrectly serve
results when there are no search replicas. The test should assert that an exception
is actually thrown, not just check its message if one happens to be thrown.

server/src/internalClusterTest/java/org/opensearch/action/admin/indices/scale/searchonly/ScaleIndexIT.java [90-95]

 } else {
-    try {
-        client().prepareSearch(TEST_INDEX).setSize(0).get();
-    } catch (Exception e) {
-        assertTrue(e.getMessage().contains("all shards failed"));
-    }
+    Exception ex = expectThrows(Exception.class, () -> client().prepareSearch(TEST_INDEX).setSize(0).get());
+    assertTrue(ex.getMessage().contains("all shards failed"));
 }
Suggestion importance[1-10]: 7

__

Why: The current try/catch block silently passes if no exception is thrown, making the test ineffective. Using expectThrows ensures the test actually fails when no exception is thrown, which is a meaningful correctness improvement for the test.

Medium
General
Explicitly handle zero search replicas health status

When isSearchOnlyClusterBlockEnabled is true and totalShards is 0 (no search
replicas configured), the condition activeShards < totalShards evaluates to false (0
< 0), returning GREEN. This is the intended behavior per the new test, but it means
an index with no active shards at all reports GREEN, which could mask real issues.
Ensure this edge case is intentional and documented, or add an explicit check for
the zero-search-replica case.

server/src/main/java/org/opensearch/cluster/health/ClusterShardHealth.java [267-273]

 if (primaryRouting == null) {
     if (isSearchOnlyClusterBlockEnabled) {
+        // When totalShards == 0 (no search replicas), treat as GREEN since scale-down without search replicas is valid
+        if (totalShards == 0) {
+            return ClusterHealthStatus.GREEN;
+        }
         return (activeShards < totalShards) ? ClusterHealthStatus.YELLOW : ClusterHealthStatus.GREEN;
     } else {
         return ClusterHealthStatus.RED;
     }
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion adds an explicit check for totalShards == 0 that is functionally equivalent to the existing code (since 0 < 0 is already false, returning GREEN). The improved code only adds a comment-like explicit branch without changing behavior, offering minimal value beyond documentation.

Low
Suggestions up to commit ae9eb21
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix silent swallowing of expected exception in test

The try/catch block inside assertBusy silently swallows the exception if no
exception is thrown, meaning the assertion would pass even if search nodes
unexpectedly have documents. The test should assert that an exception is actually
thrown, not just catch it if it happens. Consider using expectThrows or asserting
that the catch block is actually reached.

server/src/internalClusterTest/java/org/opensearch/action/admin/indices/scale/searchonly/ScaleIndexIT.java [90-95]

 } else {
-    try {
-        client().prepareSearch(TEST_INDEX).setSize(0).get();
-    } catch (Exception e) {
-        assertTrue(e.getMessage().contains("all shards failed"));
-    }
+    Exception ex = expectThrows(Exception.class, () -> client().prepareSearch(TEST_INDEX).setSize(0).get());
+    assertTrue(ex.getMessage().contains("all shards failed"));
 }
Suggestion importance[1-10]: 7

__

Why: The try/catch block inside assertBusy silently swallows the exception if no exception is thrown, meaning the test would pass even if search nodes unexpectedly have documents. Using expectThrows ensures the exception is actually thrown, making the test more robust and meaningful.

Medium
General
Clarify health status logic for zero total shards

When isSearchOnlyClusterBlockEnabled is true and activeShards == 0 and totalShards
== 0, the expression activeShards < totalShards evaluates to false, returning GREEN.
This is consistent with the new test getShardHealth(null, 0, 0, searchOnlyMetadata)
expecting GREEN. However, when there are no active shards but some total shards
(e.g., activeShards=0, totalShards=1), the result is YELLOW rather than RED, which
may mask a degraded state. Verify this is the intended behavior for the case where
no search replicas are active.

server/src/main/java/org/opensearch/cluster/health/ClusterShardHealth.java [267-273]

 if (primaryRouting == null) {
     if (isSearchOnlyClusterBlockEnabled) {
+        if (totalShards == 0) {
+            return ClusterHealthStatus.GREEN;
+        }
         return (activeShards < totalShards) ? ClusterHealthStatus.YELLOW : ClusterHealthStatus.GREEN;
     } else {
         return ClusterHealthStatus.RED;
     }
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion asks to verify behavior rather than fix a clear bug. When totalShards == 0, activeShards < totalShards is false so GREEN is already returned, making the added if (totalShards == 0) branch redundant. The improved_code doesn't change behavior, just adds an explicit check for clarity.

Low
Suggestions up to commit 6cca949
CategorySuggestion                                                                                                                                    Impact
Possible issue
Ensure exception is actually thrown in test

The try/catch block inside assertBusy silently swallows the exception if no
exception is thrown, meaning the assertion would pass even if the search succeeds
unexpectedly. The test should assert that an exception is actually thrown (e.g.,
using expectThrows) to ensure the no-search-replica behavior is correctly validated.

server/src/internalClusterTest/java/org/opensearch/action/admin/indices/scale/searchonly/ScaleIndexIT.java [89-95]

 } else {
-    try {
-        client().prepareSearch(TEST_INDEX).setSize(0).get();
-    } catch (Exception e) {
-        assertTrue(e.getMessage().contains("all shards failed"));
-    }
+    Exception e = expectThrows(Exception.class, () -> client().prepareSearch(TEST_INDEX).setSize(0).get());
+    assertTrue(e.getMessage().contains("all shards failed"));
 }
Suggestion importance[1-10]: 7

__

Why: The current try/catch block silently passes if no exception is thrown, making the test ineffective at validating the no-search-replica behavior. Using expectThrows ensures the exception is actually thrown, making the test more robust and meaningful.

Medium
General
Handle zero shards edge case explicitly

When primaryRouting is null and search-only mode is enabled with zero active shards
and zero total shards (i.e., no search replicas configured), the condition
activeShards < totalShards evaluates to 0 < 0 which is false, returning GREEN. This
may be misleading since there are no active shards serving requests. Consider
handling the case where both activeShards and totalShards are 0 explicitly.

server/src/main/java/org/opensearch/cluster/health/ClusterShardHealth.java [269]

+if (activeShards == 0 && totalShards == 0) {
+    return ClusterHealthStatus.GREEN;
+}
 return (activeShards < totalShards) ? ClusterHealthStatus.YELLOW : ClusterHealthStatus.GREEN;
Suggestion importance[1-10]: 4

__

Why: The suggestion identifies a valid edge case where activeShards == 0 and totalShards == 0 would return GREEN, but the PR's intent (based on the test change from RED to GREEN for getShardHealth(null, 0, 0, searchOnlyMetadata)) actually expects GREEN in this case, making the suggested explicit handling redundant rather than a fix.

Low

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 2eba0ca: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 8332429

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 8332429: SUCCESS

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.18%. Comparing base (dbe98aa) to head (0de1581).
⚠️ Report is 17 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20939      +/-   ##
============================================
- Coverage     73.21%   73.18%   -0.03%     
- Complexity    72620    72726     +106     
============================================
  Files          5849     5859      +10     
  Lines        332066   332461     +395     
  Branches      47951    48000      +49     
============================================
+ Hits         243109   243314     +205     
- Misses        69456    69646     +190     
  Partials      19501    19501              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 6cca949

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 6cca949: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit ae9eb21

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for ae9eb21: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@guojialiang92 guojialiang92 force-pushed the dev/support_scale_down_without_search_replica branch from ae9eb21 to f44aa13 Compare March 23, 2026 15:50
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit f44aa13

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for f44aa13: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@guojialiang92 guojialiang92 force-pushed the dev/support_scale_down_without_search_replica branch from f44aa13 to 2a68185 Compare March 24, 2026 02:30
@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 2a68185.

PathLineSeverityDescription
server/src/main/java/org/opensearch/cluster/health/ClusterShardHealth.java269lowHealth status masking: the condition `activeShards == 0 → RED` was removed for search-only mode. With the companion change to ScaleIndexOperationValidator, an index can now be placed in search-only mode with zero search replicas and the cluster health will report YELLOW or GREEN (when activeShards==0 and totalShards==0) rather than RED. This could mask a genuine data-unavailability situation from operators, though it appears to be an intentional design change to support a 'scale-to-zero without pre-configured search replicas' workflow rather than a deliberate attack.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2a68185

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 2a68185: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@guojialiang92 guojialiang92 force-pushed the dev/support_scale_down_without_search_replica branch from 2a68185 to 64c4a8d Compare March 24, 2026 09:34
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 64c4a8d

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 64c4a8d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
@guojialiang92 guojialiang92 force-pushed the dev/support_scale_down_without_search_replica branch from 64c4a8d to 9926ceb Compare March 26, 2026 02:28
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 9926ceb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 0de1581

@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for 0de1581: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@andrross
Copy link
Copy Markdown
Member

@vinaykpud Can you review this? Thanks!

@vinaykpud
Copy link
Copy Markdown
Contributor

@prudhvigodithi can you help taking look on this
Issue: #20938

@prudhvigodithi
Copy link
Copy Markdown
Member

Thanks @guojialiang92, I understood the change and one question

I want to see reporting GREEN for an index that can't serve any requests is ok? Operators relying on cluster health for monitoring/alerting would have no signal that something needs attention. Should there be any warning or indication like index will be unsearchable until search replicas are added?. I'm fine we take up this incrementally and updating the documentation https://docs.opensearch.org/latest/tuning-your-cluster/separate-index-and-search-workloads/.

@guojialiang92
Copy link
Copy Markdown
Contributor Author

guojialiang92 commented Apr 2, 2026

Thank you for your attention @vinaykpud :).

Operators relying on cluster health for monitoring/alerting would have no signal that something needs attention. Should there be any warning or indication like index will be unsearchable until search replicas are added?.

Agree.

We will subsequently promote the development of these capabilities, and our preliminary plan is to introduce the following methods.

  1. An exception is clearly reported during the query, for example, a search replica shard needs to be added.
  2. Introduce a configuration item (e.g., AUTO_ADD_SEARCH_REPLICA_ENABLE) for automatically adding search replica shards. If the number of search replica shards is found to be 0 during a query, it will be automatically adjusted to 1.

The detailed implementation method will be discussed in subsequent issues.

@prudhvigodithi
Copy link
Copy Markdown
Member

2. Introduce a configuration item (e.g., AUTO_ADD_SEARCH_REPLICA_ENABLE) for automatically adding search replica shards. If the number of search replica shards is found to be 0 during a query, it will be automatically adjusted to 1.

Nice, so after scale down with 0 replicas (including search) now on search scale the search replicas back to the actual replica count. Just curious have you tested with query wait behavior while the search replicas are up and running?

@prudhvigodithi
Copy link
Copy Markdown
Member

Agree.

We will subsequently promote the development of these capabilities, and our preliminary plan is to introduce the following methods.

  1. An exception is clearly reported during the query, for example, a search replica shard needs to be added.
  2. Introduce a configuration item (e.g., AUTO_ADD_SEARCH_REPLICA_ENABLE) for automatically adding search replica shards. If the number of search replica shards is found to be 0 during a query, it will be automatically adjusted to 1.

The detailed implementation method will be discussed in subsequent issues.

Overall LGTM.

@guojialiang92
Copy link
Copy Markdown
Contributor Author

Nice, so after scale down with 0 replicas (including search) now on search scale the search replicas back to the actual replica count. Just curious have you tested with query wait behavior while the search replicas are up and running?

@prudhvigodithi Additional waiting time is expected, along with additional rerouting and the process of downloading data from remote storage. Therefore, this configuration is dynamically configured according to business needs.

However, within ByteDance, we have already implemented a storage-computation separation architecture where only partial data is stored locally. The process of reloading search only shard will be very fast. Of course, a technology similar to #13149 is an orthogonal technology.

@guojialiang92
Copy link
Copy Markdown
Contributor Author

guojialiang92 commented Apr 2, 2026

Overall LGTM.

Merging code requires the approval of at least one core maintainer.
Thank you for taking the time to review the code again :). @prudhvigodithi

@prudhvigodithi prudhvigodithi merged commit bd33edf into opensearch-project:main Apr 2, 2026
19 checks passed
bharath-techie pushed a commit to bharath-techie/OpenSearch that referenced this pull request Apr 2, 2026
…-project#20939)

* support scale down without search replica

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>

* fix test

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>

* for test coverage

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>

---------

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
aparajita31pandey pushed a commit to aparajita31pandey/OpenSearch that referenced this pull request Apr 18, 2026
…-project#20939)

* support scale down without search replica

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>

* fix test

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>

* for test coverage

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>

---------

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
Signed-off-by: Aparajita Pandey <aparajita31pandey@gmail.com>
pradeep-L pushed a commit to pradeep-L/OpenSearch that referenced this pull request Apr 21, 2026
…-project#20939)

* support scale down without search replica

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>

* fix test

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>

* for test coverage

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>

---------

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants