Skip to content

[Feature Request] WLM group-level search settings #20555

@dzane17

Description

@dzane17

Is your feature request related to a problem? Please describe

Workload management in OpenSearch lets administrators group and manage search traffic using workload groups and rules, so that resource policies can be applied consistently to a class of requests. Today, most search behavior is governed by cluster-level defaults, which means tenants generally operate under the same baseline conditions. Callers can override some behavior via request parameters, but that approach is hard to govern at scale and can allow requests to exceed their intended limits if not tightly controlled.

As OpenSearch deployments move toward multitenancy, the need to customize search behavior per tenant increases. Requiring every client to pass the right headers/parameters on every query is operationally difficult and error-prone. Enabling per–workload group overrides aligns with existing WLM patterns, where requests are associated with a workload group and policies are applied automatically and consistently.

Describe the solution you'd like

Allow WLM groups to optionally define search_settings that are applied to all search requests assigned to the group. For example, a WLM group object could look like:

{
  "name": "analytics",
  "resiliency_mode": "enforced",
  "resource_limits": {
    "cpu": 0.1,
    "memory": 0.1
  },
  "search_settings": {        // New field in WLM group //
    "timeout": "500ms",
    "cancel_after_time_interval": "5s",
    "max_concurrent_shard_requests": 3
  }
}

Possbile settings to onboard:

  1. cancel_after_time_interval
    Ensures that long-running searches are automatically canceled after a fixed interval, preventing runaway queries from consuming cluster resources indefinitely and protecting other tenants from noisy neighbors.

  2. timeout
    Enforces a hard upper bound on how long a search is allowed to execute, helping keep latency predictable for a workload group and avoiding situations where slow or stalled queries tie up search threads.

  3. max_concurrent_shard_requests
    Limits how many shard-level requests a single search can execute in parallel, reducing fan-out pressure on the cluster and preventing high-cardinality queries from overwhelming CPU and thread pools.

  4. batched_reduce_size
    Controls how many shard results are reduced at a time during the reduce phase, helping to manage memory usage for large fan-out searches and reducing peak heap pressure in multi-tenant environments.

  5. phase_took
    Enables or disables detailed per-phase timing information in responses, allowing operators to balance observability needs against response size and overhead for specific workload groups.

  6. max_buckets
    Caps the number of aggregation buckets that a search can produce, protecting the cluster from excessive memory consumption caused by unbounded or poorly designed aggregations.

Related component

Search:Resiliency

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Search:ResiliencyenhancementEnhancement or improvement to existing feature or requestv3.6.0Issues and PRs related to version 3.6.0

Type

No type

Projects

Status

🏗 In progress

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions