Skip to content

WLM group custom search settings#20536

Merged
jainankitk merged 2 commits intoopensearch-project:mainfrom
dzane17:wlm-settings
Mar 10, 2026
Merged

WLM group custom search settings#20536
jainankitk merged 2 commits intoopensearch-project:mainfrom
dzane17:wlm-settings

Conversation

@dzane17
Copy link
Copy Markdown
Member

@dzane17 dzane17 commented Feb 4, 2026

Description

This PR adds the foundational infrastructure for defining custom search settings within a WLM group, along with the first setting implementation (timeout).

What's included

  • WorkloadGroupSearchSettings enum with validation framework for search settings
  • search_settings field in MutableWorkloadGroupFragment and WorkloadGroup
  • WorkloadGroupRequestOperationListener integration to apply settings during request start
  • timeout setting implementation

Supported setting (this PR)

timeout - Enforces a hard upper bound on how long a search is allowed to execute, helping keep latency predictable for a workload group and avoiding situations where slow or stalled queries tie up search threads. The WLM timeout is only applied when the request does not already have an explicit timeout set.

Usage

Create WLM group with search_settings:

curl -XPUT "http://localhost:9200/_wlm/workload_group?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "analytics",
  "resiliency_mode": "enforced",
  "resource_limits": {
    "cpu": 0.1,
    "memory": 0.1
  },
  "search_settings": {
    "timeout": "30s"
  }
}'

Update WLM group:

curl -XPUT "http://localhost:9200/_wlm/workload_group/analytics?pretty" -H 'Content-Type: application/json' -d'
{
  "search_settings": {
    "timeout": "1m"
  }
}'

Related Issues

Part of #20555

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 4, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This change implements support for custom search settings at the WLM group level. It introduces a searchSettings field to workload groups, adds validation logic for supported settings, enables serialization/deserialization with version awareness, and applies group-level settings to search requests during request processing.

Changes

Cohort / File(s) Summary
Changelog & Test Infrastructure
CHANGELOG.md, plugins/workload-management/src/test/java/org/opensearch/plugin/wlm/WorkloadManagementTestUtils.java
Added changelog entry for WLM group custom search settings feature and extended test utilities with a new workload group variant (workloadGroupWithSearchSettings) including predefined search settings and resiliency mode ENFORCED.
WLM Response Tests
plugins/workload-management/src/test/java/org/opensearch/plugin/wlm/action/CreateWorkloadGroupResponseTests.java, ...GetWorkloadGroupResponseTests.java, ...UpdateWorkloadGroupResponseTests.java
Added new test methods to validate toXContent serialization with populated search_settings, and updated existing tests to expect empty search_settings object in output.
Search Settings Validation
server/src/main/java/org/opensearch/wlm/WorkloadGroupSearchSettings.java, server/src/test/java/org/opensearch/wlm/WorkloadGroupSearchSettingsTests.java
Introduced new utility class defining WlmSearchSetting enum with per-setting validators for BATCHED_REDUCE_SIZE, TIMEOUT, CANCEL_AFTER_TIME_INTERVAL, MAX_CONCURRENT_SHARD_REQUESTS, MAX_BUCKET, and PHASE_TOOK, plus batch validation method with comprehensive test coverage.
WorkloadGroup Core Models
server/src/main/java/org/opensearch/cluster/metadata/WorkloadGroup.java, server/src/main/java/org/opensearch/wlm/MutableWorkloadGroupFragment.java
Added searchSettings field with null-check enforcement, getters/setters, version-aware serialization (V_3_5_0+), and equality/hashCode updates; enhanced name validation with Locale-based formatting and updated constructors to accept and propagate search settings.
SearchRequest Enhancements
server/src/main/java/org/opensearch/action/search/SearchRequest.java
Added raw accessor methods (getBatchedReduceSizeRaw, getMaxConcurrentShardRequestsRaw) to expose unset values, introduced static validatePositiveInteger validator method, and updated validation calls to use the new validator.
WLM Request Processing
server/src/main/java/org/opensearch/wlm/listeners/WorkloadGroupRequestOperationListener.java, server/src/main/java/org/opensearch/node/Node.java
Updated listener constructor to accept ClusterService dependency, implemented applyWorkloadGroupSearchSettings to fetch and apply per-group settings to incoming SearchRequest with per-entry error handling; updated Node instantiation to pass ClusterService.
Search Service Integration
server/src/main/java/org/opensearch/search/SearchService.java, server/src/main/java/org/opensearch/search/aggregations/MultiBucketConsumerService.java, server/src/main/java/org/opensearch/action/search/TransportSearchAction.java
Added conditional MultiBucketConsumer creation based on MAX_BUCKET header override in aggregation parsing and reduction paths; added create(int limit) overload to MultiBucketConsumerService; relocated phase_took initialization timing in TransportSearchAction.
Workload Group Tests
server/src/test/java/org/opensearch/cluster/metadata/WorkloadGroupTests.java, WorkloadGroupMetadataTests.java, server/src/test/java/org/opensearch/wlm/listeners/WorkloadGroupRequestOperationListenerTests.java
Extended test workload group creation to pass search settings map, updated expected toXContent output to include search_settings block, added assertions for search settings verification, and updated listener tests for expanded constructor signature.

Sequence Diagram

sequenceDiagram
    participant Client as Client/Request
    participant Listener as WorkloadGroupRequestOperationListener
    participant ClusterState as Cluster State
    participant WorkloadGroup as WorkloadGroup
    participant SearchReq as SearchRequest
    participant SearchService as SearchService
    participant Aggregator as MultiBucketConsumer

    Client->>Listener: onRequestStart(searchRequestContext)
    Listener->>ClusterState: getMetadata().workloadGroups()
    ClusterState->>Listener: WorkloadGroup
    Listener->>WorkloadGroup: getSearchSettings()
    WorkloadGroup->>Listener: Map<String, String> searchSettings
    
    loop For each setting (batched_reduce_size, timeout, etc.)
        Listener->>SearchReq: apply setting (respecting min/max logic)
        Note over Listener,SearchReq: Handle BATCHED_REDUCE_SIZE,<br/>TIMEOUT, CANCEL_AFTER_TIME_INTERVAL,<br/>MAX_CONCURRENT_SHARD_REQUESTS,<br/>MAX_BUCKET, PHASE_TOOK
    end
    
    Listener->>SearchService: proceed with updated SearchRequest
    SearchService->>SearchService: parseSource (check MAX_BUCKET header)
    SearchService->>Aggregator: create(limit) or create()
    Aggregator->>SearchService: MultiBucketConsumer configured
    SearchService->>Client: execute search with WLM settings applied
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested labels

Search:Performance, v3.5.0

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'WLM group custom search settings' directly describes the main change—adding support for custom search settings on WLM groups.
Linked Issues check ✅ Passed The PR comprehensively addresses #19079 by implementing cancel_after_time_interval and five additional search settings (timeout, max_concurrent_shard_requests, batched_reduce_size, phase_took, max_buckets) with proper WLM integration.
Out of Scope Changes check ✅ Passed All changes are directly aligned with implementing WLM group custom search settings. No unrelated modifications detected—only additions to support the feature and its tests.
Description check ✅ Passed The PR description comprehensively covers all required template sections: detailed description of changes, related issues, and completed check list items.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request Search:Resiliency labels Feb 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 4, 2026

❌ Gradle check result for ebc4077: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 4, 2026

❌ Gradle check result for ffeb770: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 28, 2026

PR Reviewer Guide 🔍

(Review updated until commit 965d2f1)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: WLM search_settings data model and serialization

Relevant files:

  • server/src/main/java/org/opensearch/wlm/WorkloadGroupSearchSettings.java
  • server/src/main/java/org/opensearch/wlm/MutableWorkloadGroupFragment.java
  • server/src/main/java/org/opensearch/cluster/metadata/WorkloadGroup.java
  • server/src/test/java/org/opensearch/wlm/WorkloadGroupSearchSettingsTests.java
  • server/src/test/java/org/opensearch/cluster/metadata/WorkloadGroupTests.java
  • server/src/test/java/org/opensearch/cluster/metadata/WorkloadGroupMetadataTests.java
  • plugins/workload-management/src/test/java/org/opensearch/plugin/wlm/WorkloadManagementTestUtils.java
  • plugins/workload-management/src/test/java/org/opensearch/plugin/wlm/action/CreateWorkloadGroupResponseTests.java
  • plugins/workload-management/src/test/java/org/opensearch/plugin/wlm/action/GetWorkloadGroupResponseTests.java
  • plugins/workload-management/src/test/java/org/opensearch/plugin/wlm/action/UpdateWorkloadGroupResponseTests.java

Sub-PR theme: WLM timeout enforcement in request listener

Relevant files:

  • server/src/main/java/org/opensearch/wlm/WorkloadGroupService.java
  • server/src/main/java/org/opensearch/wlm/listeners/WorkloadGroupRequestOperationListener.java
  • server/src/test/java/org/opensearch/wlm/listeners/WorkloadGroupRequestOperationListenerTests.java
  • plugins/workload-management/src/javaRestTest/java/org/opensearch/rest/WorkloadManagementRestIT.java

⚡ Recommended focus areas for review

Null Source Timeout

When searchRequest.source() is null, the WLM timeout setting is silently skipped. This means a search request with no source builder will never have the WLM timeout applied, even if the workload group defines one. Consider initializing a source builder or applying the timeout at the request level instead.

if (searchRequest.source() != null && searchRequest.source().timeout() == null) {
    searchRequest.source()
        .timeout(
            TimeValue.parseTimeValue(
                entry.getValue(),
                WorkloadGroupSearchSettings.WlmSearchSetting.TIMEOUT.getSettingName()
            )
        );
}
Version Boundary

The serialization uses Version.V_3_6_0 as the version gate for searchSettings. If this feature is being introduced in a version earlier than 3.6.0, the version check will never be true and search settings will never be serialized/deserialized. Verify that V_3_6_0 is the correct version constant for this feature.

if (in.getVersion().onOrAfter(Version.V_3_6_0)) {
    // Read null marker: true means searchSettings is null (not specified)
    boolean isNull = in.readBoolean();
    searchSettings = isNull ? null : in.readMap(StreamInput::readString, StreamInput::readString);
} else {
    searchSettings = new HashMap<>();
}
Null Normalization Side Effect

The constructor mutates the incoming mutableWorkloadGroupFragment parameter by replacing it with a new instance when searchSettings is null. This is a side-effect-free replacement (the local variable is reassigned), but it may be surprising. Additionally, this normalization only happens in the constructor, not in updateExistingWorkloadGroup, which has its own null-handling logic. Ensure these two paths remain consistent.

if (mutableWorkloadGroupFragment.getSearchSettings() == null) {
    mutableWorkloadGroupFragment = new MutableWorkloadGroupFragment(
        mutableWorkloadGroupFragment.getResiliencyMode(),
        mutableWorkloadGroupFragment.getResourceLimits(),
        new HashMap<>()
    );
}
Empty search_settings in getCreateJson

The getCreateJson helper now always includes "search_settings": {} in every create request. This changes the behavior of all existing tests that use this helper, potentially masking issues with the default (no search_settings) code path. Consider adding a separate helper or making search_settings optional.

+ "    },\n"
+ "    \"search_settings\": {}\n"

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 28, 2026

PR Code Suggestions ✨

Latest suggestions up to 965d2f1

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null input in lookup method

The fromKey method will throw a NullPointerException when settingName is null
because setting.settingName.equals(settingName) is called on the enum field rather
than the argument. The null check in validateSearchSettings handles null keys before
calling fromKey, but the test testFromKeyInvalidSetting explicitly tests
fromKey(null) and expects null to be returned. Swap the equals call to use the
argument safely.

server/src/main/java/org/opensearch/wlm/WorkloadGroupSearchSettings.java [70-77]

 public static WlmSearchSetting fromKey(String settingName) {
+    if (settingName == null) {
+        return null;
+    }
     for (WlmSearchSetting setting : values()) {
         if (setting.settingName.equals(settingName)) {
             return setting;
         }
     }
     return null;
 }
Suggestion importance[1-10]: 7

__

Why: The fromKey(null) call would throw a NullPointerException since setting.settingName.equals(null) would not throw but the test explicitly expects null to be returned for a null input. This is a real bug that needs fixing to match the test expectations.

Medium
Apply timeout when request source is null

When searchRequest.source() is null, the timeout setting is silently skipped even
though a WLM timeout is configured. Consider initializing a new SearchSourceBuilder
when the source is null so the timeout can be applied, otherwise the WLM timeout
will never take effect for requests without an explicit source.

server/src/main/java/org/opensearch/wlm/listeners/WorkloadGroupRequestOperationListener.java [84-96]

 case TIMEOUT:
     // Only apply WLM timeout when the request has no explicit timeout
-    if (searchRequest.source() != null && searchRequest.source().timeout() == null) {
+    if (searchRequest.source() == null) {
+        searchRequest.source(new SearchSourceBuilder());
+    }
+    if (searchRequest.source().timeout() == null) {
         searchRequest.source()
             .timeout(
                 TimeValue.parseTimeValue(
                     entry.getValue(),
                     WorkloadGroupSearchSettings.WlmSearchSetting.TIMEOUT.getSettingName()
                 )
             );
     }
     break;
Suggestion importance[1-10]: 6

__

Why: The suggestion is valid - when searchRequest.source() is null, the WLM timeout is silently skipped. However, the existing test testApplySearchSettings_Timeout_NullSource explicitly verifies that the source remains null when no source is set, suggesting this behavior may be intentional to avoid modifying requests that don't have a source builder.

Low
Use correct version for serialization compatibility

The version gate uses Version.V_3_6_0, but this feature is being introduced in what
appears to be an earlier version based on the PR context. Using the wrong version
constant means the field will not be serialized/deserialized correctly during
rolling upgrades, potentially causing data loss or NullPointerException when nodes
on different versions communicate. Verify and use the correct minimum version
constant that matches when this feature is actually being introduced.

server/src/main/java/org/opensearch/wlm/MutableWorkloadGroupFragment.java [73-79]

-if (in.getVersion().onOrAfter(Version.V_3_6_0)) {
+if (in.getVersion().onOrAfter(Version.V_3_0_0)) {
     // Read null marker: true means searchSettings is null (not specified)
     boolean isNull = in.readBoolean();
     searchSettings = isNull ? null : in.readMap(StreamInput::readString, StreamInput::readString);
 } else {
     searchSettings = new HashMap<>();
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about using the correct version constant for rolling upgrade compatibility. However, the improved_code uses V_3_0_0 which may not be correct either - the actual version depends on when this feature is being introduced, and the suggestion doesn't provide definitive evidence of the correct version.

Low
General
Avoid altering shared test helper behavior

Adding "search_settings": {} to every getCreateJson call changes the contract for
all existing tests that use this helper, potentially causing unexpected behavior if
the server rejects or differently handles an explicit empty search_settings object
versus omitting it. Consider keeping the existing helper unchanged and creating a
separate overload or helper that includes search_settings only when needed.

plugins/workload-management/src/javaRestTest/java/org/opensearch/rest/WorkloadManagementRestIT.java [203-205]

-+ "    },\n"
-+ "    \"search_settings\": {}\n"
++ "    }\n"
 + "}";
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid concern about changing the shared getCreateJson helper to always include search_settings: {}, which could affect existing tests. However, the PR intentionally adds search_settings support and an empty object is likely valid, so this is more of a design consideration than a bug.

Low

Previous suggestions

Suggestions up to commit 0287c7a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null input in enum lookup

The fromKey method will throw a NullPointerException if settingName is null because
setting.settingName.equals(settingName) is called with a null argument. The test
testFromKeyInvalidSetting already tests for null input, so a null guard should be
added.

server/src/main/java/org/opensearch/wlm/WorkloadGroupSearchSettings.java [70-77]

 public static WlmSearchSetting fromKey(String settingName) {
+    if (settingName == null) {
+        return null;
+    }
     for (WlmSearchSetting setting : values()) {
         if (setting.settingName.equals(settingName)) {
             return setting;
         }
     }
     return null;
 }
Suggestion importance[1-10]: 7

__

Why: The fromKey method will throw a NullPointerException when settingName is null because setting.settingName.equals(settingName) is called with a null argument. The test testFromKeyInvalidSetting already asserts assertNull for a null input, meaning this is a real bug that would cause the test to fail with an NPE rather than returning null.

Medium
General
Apply timeout even when search source is null

When searchRequest.source() is null, the WLM timeout setting is silently skipped
even though the group has a configured timeout. Consider initializing a new
SearchSourceBuilder when the source is null so the timeout can be applied, or
document explicitly that a null source means the timeout is intentionally not
applied.

server/src/main/java/org/opensearch/wlm/listeners/WorkloadGroupRequestOperationListener.java [84-96]

 case TIMEOUT:
     // Only apply WLM timeout when the request has no explicit timeout
-    if (searchRequest.source() != null && searchRequest.source().timeout() == null) {
+    if (searchRequest.source() == null) {
+        searchRequest.source(new SearchSourceBuilder());
+    }
+    if (searchRequest.source().timeout() == null) {
         searchRequest.source()
             .timeout(
                 TimeValue.parseTimeValue(
                     entry.getValue(),
                     WorkloadGroupSearchSettings.WlmSearchSetting.TIMEOUT.getSettingName()
                 )
             );
     }
     break;
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid design question about whether a null source should prevent the WLM timeout from being applied. However, the existing test testApplySearchSettings_Timeout_NullSource explicitly verifies that a null source remains null after the call, indicating this is intentional behavior. The suggestion contradicts the documented and tested design choice.

Low
Avoid modifying unrelated test payloads

Adding "search_settings": {} to all existing test create payloads changes the
behavior of pre-existing tests that were not intended to test search settings. This
could mask regressions where an empty search_settings object causes unexpected
behavior. Consider keeping the original helper unchanged and creating a separate
overload or helper for tests that need search settings.

plugins/workload-management/src/javaRestTest/java/org/opensearch/rest/WorkloadManagementRestIT.java [188-205]

 static String getCreateJson(String name, String resiliencyMode, double cpu, double memory) {
     return "{\n"
-        ...
-        + "    },\n"
-        + "    \"search_settings\": {}\n"
+        + "    \"name\": \""
+        + name
+        + "\",\n"
+        + "    \"resiliency_mode\": \""
+        + resiliencyMode
+        + "\",\n"
+        + "    \"resource_limits\": {\n"
+        + "        \"cpu\" : "
+        + cpu
+        + ",\n"
+        + "        \"memory\" : "
+        + memory
+        + "\n"
+        + "    }\n"
         + "}";
 }
 
+static String getCreateJsonWithSearchSettings(String name, String resiliencyMode, double cpu, double memory, String searchSettingsJson) {
+    return "{\n"
+        + "    \"name\": \""
+        + name
+        + "\",\n"
+        + "    \"resiliency_mode\": \""
+        + resiliencyMode
+        + "\",\n"
+        + "    \"resource_limits\": {\n"
+        + "        \"cpu\" : "
+        + cpu
+        + ",\n"
+        + "        \"memory\" : "
+        + memory
+        + "\n"
+        + "    },\n"
+        + "    \"search_settings\": "
+        + searchSettingsJson
+        + "\n"
+        + "}";
+}
+
Suggestion importance[1-10]: 4

__

Why: The suggestion makes a valid point that adding search_settings: {} to all existing test payloads changes the scope of pre-existing tests. However, since the server now normalizes null searchSettings to an empty map, sending {} is functionally equivalent to omitting it, making the impact minimal in practice.

Low
Suggestions up to commit 2b3d1b4
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null input in setting lookup

The fromKey method will throw a NullPointerException if settingName is null because
setting.settingName.equals(settingName) is called with a null argument. The test
testFromKeyInvalidSetting already tests for null input and expects null to be
returned, so a null guard is needed.

server/src/main/java/org/opensearch/wlm/WorkloadGroupSearchSettings.java [70-77]

 public static WlmSearchSetting fromKey(String settingName) {
+    if (settingName == null) {
+        return null;
+    }
     for (WlmSearchSetting setting : values()) {
         if (setting.settingName.equals(settingName)) {
             return setting;
         }
     }
     return null;
 }
Suggestion importance[1-10]: 8

__

Why: The fromKey method will throw a NullPointerException when called with null because setting.settingName.equals(settingName) will fail. The test testFromKeyInvalidSetting explicitly tests for null input expecting null return, confirming this is a real bug that needs fixing.

Medium
Fix version gate inconsistency for serialization

The LLD document specifies version-gating to V_3_5_0, but the code uses V_3_6_0.
This inconsistency could cause backward compatibility issues in mixed-version
clusters. Align the version gate with the intended version from the design document.

server/src/main/java/org/opensearch/wlm/MutableWorkloadGroupFragment.java [73-79]

-if (in.getVersion().onOrAfter(Version.V_3_6_0)) {
+if (in.getVersion().onOrAfter(Version.V_3_5_0)) {
     // Read null marker: true means searchSettings is null (not specified)
     boolean isNull = in.readBoolean();
     searchSettings = isNull ? null : in.readMap(StreamInput::readString, StreamInput::readString);
 } else {
     searchSettings = new HashMap<>();
 }
Suggestion importance[1-10]: 6

__

Why: The LLD document specifies V_3_5_0 but the code uses V_3_6_0. However, the LLD is a design document and the actual implementation may have intentionally chosen V_3_6_0. This is a potential inconsistency worth flagging, but it's not necessarily a bug since both read and write use the same version constant (V_3_6_0), maintaining internal consistency.

Low
Fix version gate in serialization write path

The writeTo method uses V_3_6_0 while the readFrom (StreamInput constructor) also
uses V_3_6_0, but the LLD specifies V_3_5_0. Both read and write must use the same
version constant to ensure correct serialization/deserialization. Fix the version
gate here to match the intended version.

server/src/main/java/org/opensearch/wlm/MutableWorkloadGroupFragment.java [193-198]

-if (out.getVersion().onOrAfter(Version.V_3_6_0)) {
+if (out.getVersion().onOrAfter(Version.V_3_5_0)) {
     out.writeBoolean(searchSettings == null);
     if (searchSettings != null) {
         out.writeMap(searchSettings, StreamOutput::writeString, StreamOutput::writeString);
     }
 }
Suggestion importance[1-10]: 5

__

Why: Same version inconsistency as suggestion 1 - the LLD says V_3_5_0 but code uses V_3_6_0. Since both read and write use V_3_6_0 consistently, the actual backward compatibility concern is limited, but aligning with the design document is still worthwhile.

Low
General
Avoid silently modifying existing test request payloads

Adding "search_settings": {} to every getCreateJson call in existing tests may cause
those tests to fail if the server rejects an empty search_settings object or if the
response format changes. The existing tests (testCreate, testGet, testDelete, etc.)
were not designed to test search settings and this change silently alters their
request payloads. Consider keeping getCreateJson unchanged and only adding search
settings in the dedicated testSearchSettings test.

plugins/workload-management/src/javaRestTest/java/org/opensearch/rest/WorkloadManagementRestIT.java [203-205]

 static String getCreateJson(String name, String resiliencyMode, double cpu, double memory) {
     return "{\n"
         + "    \"name\": \""
         + name
-        ...
-        + "    },\n"
-        + "    \"search_settings\": {}\n"
+        + "\",\n"
+        + "    \"resiliency_mode\": \""
+        + resiliencyMode
+        + "\",\n"
+        + "    \"resource_limits\": {\n"
+        + "        \"cpu\" : "
+        + cpu
+        + ",\n"
+        + "        \"memory\" : "
+        + memory
+        + "\n"
+        + "    }\n"
         + "}";
 }
Suggestion importance[1-10]: 4

__

Why: Adding "search_settings": {} to all existing test payloads could affect test behavior if the server handles empty search_settings differently. However, since the server normalizes null to empty map, this is likely intentional to verify backward compatibility with empty settings.

Low
Suggestions up to commit 4dde46a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null input in enum lookup

fromKey will throw a NullPointerException when settingName is null because
setting.settingName.equals(settingName) is called with a null argument on the enum
constant's non-null field. The test testFromKeyInvalidSetting already expects null
to be returned for a null input, so a null-check must be added.

server/src/main/java/org/opensearch/wlm/WorkloadGroupSearchSettings.java [68-75]

 public static WlmSearchSetting fromKey(String settingName) {
+    if (settingName == null) {
+        return null;
+    }
     for (WlmSearchSetting setting : values()) {
         if (setting.settingName.equals(settingName)) {
             return setting;
         }
     }
     return null;
 }
Suggestion importance[1-10]: 8

__

Why: The test testFromKeyInvalidSetting explicitly asserts that fromKey(null) returns null, but the current implementation would throw a NullPointerException because setting.settingName.equals(null) is safe, but the loop calls setting.settingName.equals(settingName) which is fine — actually String.equals(null) returns false, not NPE. Wait, re-examining: setting.settingName.equals(settingName) where settingName is null — String.equals(null) returns false, so no NPE. However the suggestion is still valid as a defensive guard and the test expects null return, which already works. The suggestion adds clarity but isn't strictly a bug fix.

Medium
Avoid overwriting explicitly set request settings

The applyWorkloadGroupSearchSettings method unconditionally overwrites phaseTook on
the SearchRequest, even if it was already explicitly set by the caller. This means a
per-request override of phase_took would be silently ignored. Only apply the WLM
group setting when the request has not already set phaseTook.

server/src/main/java/org/opensearch/wlm/listeners/WorkloadGroupRequestOperationListener.java [83-87]

 switch (settingKey) {
     case PHASE_TOOK:
-        searchRequest.setPhaseTook(Boolean.parseBoolean(entry.getValue()));
+        if (searchRequest.isPhaseTook() == null) {
+            searchRequest.setPhaseTook(Boolean.parseBoolean(entry.getValue()));
+        }
         break;
 }
Suggestion importance[1-10]: 7

__

Why: The PR moves the cluster-setting fallback for phaseTook to run after onRequestStart, specifically to allow WLM settings to take effect first. Unconditionally overwriting phaseTook in the WLM listener would prevent per-request overrides from being respected. Adding a null-check here aligns with the ordering logic introduced in TransportSearchAction.

Medium
General
Avoid mutating caller's fragment during normalization

The normalization silently mutates the caller's MutableWorkloadGroupFragment
reference by reassigning the local variable, which is confusing and error-prone.
More importantly, updateExistingWorkloadGroup relies on getSearchSettings()
returning null to distinguish "not specified" from "explicitly empty", but after
this normalization any fragment passed to the constructor will lose that
distinction. Consider only normalizing at the point of storage (e.g., in the field
assignment) rather than reconstructing the fragment.

server/src/main/java/org/opensearch/cluster/metadata/WorkloadGroup.java [76-83]

-// Normalize null searchSettings to empty map for storage
-if (mutableWorkloadGroupFragment.getSearchSettings() == null) {
-    mutableWorkloadGroupFragment = new MutableWorkloadGroupFragment(
-        mutableWorkloadGroupFragment.getResiliencyMode(),
-        mutableWorkloadGroupFragment.getResourceLimits(),
-        new HashMap<>()
-    );
-}
+this.name = name;
+this._id = _id;
+Map<String, String> resolvedSearchSettings = mutableWorkloadGroupFragment.getSearchSettings() != null
+    ? mutableWorkloadGroupFragment.getSearchSettings()
+    : new HashMap<>();
+this.mutableWorkloadGroupFragment = new MutableWorkloadGroupFragment(
+    mutableWorkloadGroupFragment.getResiliencyMode(),
+    mutableWorkloadGroupFragment.getResourceLimits(),
+    resolvedSearchSettings
+);
+this.updatedAtInMillis = updatedAt;
Suggestion importance[1-10]: 6

__

Why: The normalization in the constructor reassigns the local mutableWorkloadGroupFragment variable, which is confusing. More critically, updateExistingWorkloadGroup constructs a new MutableWorkloadGroupFragment with the resolved updatedSearchSettings before passing it to the constructor, so the null-to-empty normalization in the constructor is redundant there. The improved code is cleaner and avoids the confusing local variable reassignment pattern.

Low
Remove unintended payload change in shared test helper

Adding "search_settings": {} to every getCreateJson call changes the payload for all
existing tests (e.g., testCreate, testMultipleCreate, etc.), which may cause
unintended side effects or mask regressions. The helper should keep its original
behavior and the new testSearchSettings test already constructs its own JSON, so the
empty object addition is unnecessary and potentially harmful.

plugins/workload-management/src/javaRestTest/java/org/opensearch/rest/WorkloadManagementRestIT.java [188-205]

 static String getCreateJson(String name, String resiliencyMode, double cpu, double memory) {
     return "{\n"
-        ...
-        + "    },\n"
-        + "    \"search_settings\": {}\n"
+        + "    \"name\": \""
+        + name
+        + "\",\n"
+        + "    \"resiliency_mode\": \""
+        + resiliencyMode
+        + "\",\n"
+        + "    \"resource_limits\": {\n"
+        + "        \"cpu\" : "
+        + cpu
+        + ",\n"
+        + "        \"memory\" : "
+        + memory
+        + "\n"
+        + "    }\n"
         + "}";
 }
Suggestion importance[1-10]: 5

__

Why: Adding "search_settings": {} to every getCreateJson call changes the request payload for all existing tests. While an empty search_settings object is likely harmless if the server accepts it, it's an unintended side effect that could mask regressions or cause failures if the server strictly validates the payload.

Low
Suggestions up to commit 0ee5538
CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid overwriting user-specified request settings

The applyWorkloadGroupSearchSettings method unconditionally overwrites phaseTook on
the SearchRequest, even if the user explicitly set it in their request. The WLM
group setting should only apply when the user has not already specified a value
(i.e., when isPhaseTook() returns null), to respect user intent.

server/src/main/java/org/opensearch/wlm/listeners/WorkloadGroupRequestOperationListener.java [83-87]

 switch (settingKey) {
     case PHASE_TOOK:
-        searchRequest.setPhaseTook(Boolean.parseBoolean(entry.getValue()));
+        if (searchRequest.isPhaseTook() == null) {
+            searchRequest.setPhaseTook(Boolean.parseBoolean(entry.getValue()));
+        }
         break;
 }
Suggestion importance[1-10]: 8

__

Why: The PR moves the cluster-setting fallback for isPhaseTook to run after onRequestStart, specifically so WLM can set it first. If the WLM setting unconditionally overwrites the user's explicit request value, user intent is ignored. Adding a null-check preserves the intended priority order (user > WLM group > cluster default).

Medium
Guard against null input in lookup method

The fromKey method will throw a NullPointerException if settingName is null because
setting.settingName.equals(settingName) is called with a null argument on a non-null
string. The test testFromKeyInvalidSetting already tests fromKey(null) and expects
null to be returned, so the implementation needs a null guard.

server/src/main/java/org/opensearch/wlm/WorkloadGroupSearchSettings.java [68-75]

 public static WlmSearchSetting fromKey(String settingName) {
+    if (settingName == null) {
+        return null;
+    }
     for (WlmSearchSetting setting : values()) {
         if (setting.settingName.equals(settingName)) {
             return setting;
         }
     }
     return null;
 }
Suggestion importance[1-10]: 8

__

Why: The existing test testFromKeyInvalidSetting explicitly calls fromKey(null) and expects null back, but the current implementation will throw a NullPointerException because setting.settingName.equals(null) is called on a non-null string with a null argument — this is a real bug that will cause a test failure.

Medium
Fix inverted boolean convention in serialization

The writeTo method writes out.writeBoolean(searchSettings == null) (true when null),
but the read side interprets isNull = in.readBoolean() as true meaning null. This
logic is correct but fragile — the comment says "true means searchSettings is null"
which is the inverse of the typical convention. More critically, the writeTo only
writes the map when searchSettings != null, which matches the read side, so the
protocol is consistent. However, the boolean semantics (true = null) are
non-standard and error-prone; consider inverting to writeBoolean(searchSettings !=
null) (true = present) for clarity and consistency.

server/src/main/java/org/opensearch/wlm/MutableWorkloadGroupFragment.java [195-200]

+if (out.getVersion().onOrAfter(Version.V_3_5_0)) {
+    out.writeBoolean(searchSettings != null);
+    if (searchSettings != null) {
+        out.writeMap(searchSettings, StreamOutput::writeString, StreamOutput::writeString);
+    }
+}
+// In readSide:
 if (in.getVersion().onOrAfter(Version.V_3_5_0)) {
-    // Read null marker: true means searchSettings is null (not specified)
-    boolean isNull = in.readBoolean();
-    searchSettings = isNull ? null : in.readMap(StreamInput::readString, StreamInput::readString);
+    boolean isPresent = in.readBoolean();
+    searchSettings = isPresent ? in.readMap(StreamInput::readString, StreamInput::readString) : null;
 } else {
     searchSettings = new HashMap<>();
 }
Suggestion importance[1-10]: 5

__

Why: The write/read protocol is functionally consistent (both sides agree that true means null), so there is no actual bug. The suggestion is a readability/convention improvement — using true = present is more standard — but it is not a correctness issue.

Low
General
Remove unrelated field from existing test helper

Adding "search_settings": {} to all existing test create payloads changes the
behavior of all existing tests, potentially masking regressions where
search_settings should be absent. Tests that don't involve search settings should
not include the field, to keep test coverage orthogonal and avoid unintended side
effects.

plugins/workload-management/src/javaRestTest/java/org/opensearch/rest/WorkloadManagementRestIT.java [188-205]

 static String getCreateJson(String name, String resiliencyMode, double cpu, double memory) {
     return "{\n"
-        ...
-        + "    },\n"
-        + "    \"search_settings\": {}\n"
+        + "    \"name\": \""
+        + name
+        + "\",\n"
+        + "    \"resiliency_mode\": \""
+        + resiliencyMode
+        + "\",\n"
+        + "    \"resource_limits\": {\n"
+        + "        \"cpu\" : "
+        + cpu
+        + ",\n"
+        + "        \"memory\" : "
+        + memory
+        + "\n"
+        + "    }\n"
         + "}";
 }
Suggestion importance[1-10]: 4

__

Why: Adding "search_settings": {} to the shared getCreateJson helper changes all existing tests to include an empty search_settings field, which may mask regressions and reduces test orthogonality. However, if the server accepts and ignores empty search_settings, this is a minor style concern rather than a correctness issue.

Low
Suggestions up to commit 2cad3c6
CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove stray quote from JSON string literal

The text block ends with }' (a stray single quote inside the string), which was
carried over from the old string concatenation. This trailing ' is part of the JSON
string and will cause the request to fail with a parse error rather than the
expected ResponseException for invalid values.

plugins/workload-management/src/javaRestTest/java/org/opensearch/rest/WorkloadManagementRestIT.java [88-94]

 String json = """
     {
         "resource_limits": {
             "cpu" : 1.1,
             "memory" : -0.1
         }
-    }'""";
+    }""";
Suggestion importance[1-10]: 8

__

Why: The trailing ' inside the text block (carried over from old string concatenation) makes the JSON invalid, causing the request to fail with a parse error instead of the expected ResponseException for invalid resource limit values. This is a real bug that would cause the test to fail for the wrong reason.

Medium
Respect existing request setting before overriding

The comment above the loop says "WLM settings are applied only if the corresponding
setting is not already set in the request", but there is no guard checking
searchRequest.isPhaseTook() == null before overwriting it. This means a per-request
phase_took parameter set by the user will be silently overridden by the WLM group
setting.

server/src/main/java/org/opensearch/wlm/listeners/WorkloadGroupRequestOperationListener.java [84-88]

 switch (settingKey) {
     case PHASE_TOOK:
-        searchRequest.setPhaseTook(Boolean.parseBoolean(entry.getValue()));
+        if (searchRequest.isPhaseTook() == null) {
+            searchRequest.setPhaseTook(Boolean.parseBoolean(entry.getValue()));
+        }
         break;
 }
Suggestion importance[1-10]: 7

__

Why: The code comment says settings are only applied if not already set, but the implementation unconditionally overwrites isPhaseTook(). This contradicts the stated intent and would silently override user-specified per-request settings with WLM group defaults.

Medium
Guard against null input in lookup method

fromKey calls setting.settingName.equals(settingName) without a null check on
settingName. Passing null will throw a NullPointerException at runtime, yet the test
testFromKeyInvalidSetting expects null to be returned for a null input.

server/src/main/java/org/opensearch/wlm/WorkloadGroupSearchSettings.java [68-75]

 public static WlmSearchSetting fromKey(String settingName) {
+    if (settingName == null) {
+        return null;
+    }
     for (WlmSearchSetting setting : values()) {
         if (setting.settingName.equals(settingName)) {
             return setting;
         }
     }
     return null;
 }
Suggestion importance[1-10]: 7

__

Why: Calling setting.settingName.equals(settingName) with a null argument will throw a NullPointerException, but the test testFromKeyInvalidSetting expects null to be returned for a null input, indicating a real bug in the implementation.

Medium
General
Reduce visibility of internal validation helper

validatePositiveInteger is declared public static, making it part of the public API
of SearchRequest. This is a general-purpose utility that doesn't belong on a domain
class and could be misused or break binary compatibility in the future. It should be
private static since it is only called within SearchRequest.

server/src/main/java/org/opensearch/action/search/SearchRequest.java [740-744]

-public static void validatePositiveInteger(int value, String parameterName) {
+private static void validatePositiveInteger(int value, String parameterName) {
     if (value < 1) {
         throw new IllegalArgumentException(parameterName + " must be >= 1");
     }
 }
Suggestion importance[1-10]: 4

__

Why: Making validatePositiveInteger public static unnecessarily exposes an internal utility as part of the public API of SearchRequest. Changing it to private static is a minor but valid encapsulation improvement.

Low

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 0ee5538

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 0ee5538: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 1, 2026

Persistent review updated to latest commit 4dde46a

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

❌ Gradle check result for 4dde46a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Copy Markdown
Contributor

@kaushalmahi12 kaushalmahi12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets resolve the merge conflict, rest LGTM!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

Persistent review updated to latest commit 2b3d1b4

@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for 2b3d1b4: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 78.57143% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.33%. Comparing base (a701344) to head (965d2f1).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...g/opensearch/wlm/MutableWorkloadGroupFragment.java 75.55% 4 Missing and 7 partials ⚠️
...org/opensearch/cluster/metadata/WorkloadGroup.java 33.33% 4 Missing and 2 partials ⚠️
...steners/WorkloadGroupRequestOperationListener.java 82.60% 2 Missing and 2 partials ⚠️
...rg/opensearch/wlm/WorkloadGroupSearchSettings.java 94.11% 2 Missing ⚠️
.../java/org/opensearch/wlm/WorkloadGroupService.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20536      +/-   ##
============================================
- Coverage     73.35%   73.33%   -0.03%     
- Complexity    72253    72265      +12     
============================================
  Files          5794     5795       +1     
  Lines        329944   330042      +98     
  Branches      47620    47641      +21     
============================================
- Hits         242037   242020      -17     
- Misses        68483    68631     +148     
+ Partials      19424    19391      -33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: David Zane <davizane@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 0287c7a

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 0287c7a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Copy Markdown
Contributor

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add few more tests to increase the code coverage?

Signed-off-by: David Zane <davizane@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 965d2f1

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 965d2f1: SUCCESS

@jainankitk jainankitk merged commit f57de46 into opensearch-project:main Mar 10, 2026
33 checks passed
@dzane17 dzane17 deleted the wlm-settings branch March 10, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request Search:Resiliency v3.6.0 Issues and PRs related to version 3.6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants