-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Is your feature request related to a problem? Please describe
Workload management in OpenSearch lets administrators group and manage search traffic using workload groups and rules, so that resource policies can be applied consistently to a class of requests. Today, most search behavior is governed by cluster-level defaults, which means tenants generally operate under the same baseline conditions. Callers can override some behavior via request parameters, but that approach is hard to govern at scale and can allow requests to exceed their intended limits if not tightly controlled.
As OpenSearch deployments move toward multitenancy, the need to customize search behavior per tenant increases. Requiring every client to pass the right headers/parameters on every query is operationally difficult and error-prone. Enabling per–workload group overrides aligns with existing WLM patterns, where requests are associated with a workload group and policies are applied automatically and consistently.
Describe the solution you'd like
Allow WLM groups to optionally define search_settings that are applied to all search requests assigned to the group. For example, a WLM group object could look like:
{
"name": "analytics",
"resiliency_mode": "enforced",
"resource_limits": {
"cpu": 0.1,
"memory": 0.1
},
"search_settings": { // New field in WLM group //
"timeout": "500ms",
"cancel_after_time_interval": "5s",
"max_concurrent_shard_requests": 3
}
}
Possbile settings to onboard:
-
cancel_after_time_interval
Ensures that long-running searches are automatically canceled after a fixed interval, preventing runaway queries from consuming cluster resources indefinitely and protecting other tenants from noisy neighbors. -
timeout
Enforces a hard upper bound on how long a search is allowed to execute, helping keep latency predictable for a workload group and avoiding situations where slow or stalled queries tie up search threads. -
max_concurrent_shard_requests
Limits how many shard-level requests a single search can execute in parallel, reducing fan-out pressure on the cluster and preventing high-cardinality queries from overwhelming CPU and thread pools. -
batched_reduce_size
Controls how many shard results are reduced at a time during the reduce phase, helping to manage memory usage for large fan-out searches and reducing peak heap pressure in multi-tenant environments. -
phase_took
Enables or disables detailed per-phase timing information in responses, allowing operators to balance observability needs against response size and overhead for specific workload groups. -
max_buckets
Caps the number of aggregation buckets that a search can produce, protecting the cluster from excessive memory consumption caused by unbounded or poorly designed aggregations.
Related component
Search:Resiliency
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status