[Feature Request] WLM group-level search settings

### Is your feature request related to a problem? Please describe

Workload management in OpenSearch lets administrators group and manage search traffic using workload groups and rules, so that resource policies can be applied consistently to a class of requests. Today, most search behavior is governed by cluster-level defaults, which means tenants generally operate under the same baseline conditions. Callers can override some behavior via request parameters, but that approach is hard to govern at scale and can allow requests to exceed their intended limits if not tightly controlled.

As OpenSearch deployments move toward multitenancy, the need to customize search behavior per tenant increases. Requiring every client to pass the right headers/parameters on every query is operationally difficult and error-prone. Enabling per–workload group overrides aligns with existing WLM patterns, where requests are associated with a workload group and policies are applied automatically and consistently.

### Describe the solution you'd like

Allow WLM groups to optionally define `search_settings` that are applied to all search requests assigned to the group. For example, a WLM group object could look like:
```
{
  "name": "analytics",
  "resiliency_mode": "enforced",
  "resource_limits": {
    "cpu": 0.1,
    "memory": 0.1
  },
  "search_settings": {        // New field in WLM group //
    "timeout": "500ms",
    "cancel_after_time_interval": "5s",
    "max_concurrent_shard_requests": 3
  }
}
```
Possbile settings to onboard:
1. `cancel_after_time_interval`
Ensures that long-running searches are automatically canceled after a fixed interval, preventing runaway queries from consuming cluster resources indefinitely and protecting other tenants from noisy neighbors.

2. `timeout`
Enforces a hard upper bound on how long a search is allowed to execute, helping keep latency predictable for a workload group and avoiding situations where slow or stalled queries tie up search threads.

3. `max_concurrent_shard_requests`
Limits how many shard-level requests a single search can execute in parallel, reducing fan-out pressure on the cluster and preventing high-cardinality queries from overwhelming CPU and thread pools.

4. `batched_reduce_size`
Controls how many shard results are reduced at a time during the reduce phase, helping to manage memory usage for large fan-out searches and reducing peak heap pressure in multi-tenant environments.

5. `phase_took`
Enables or disables detailed per-phase timing information in responses, allowing operators to balance observability needs against response size and overhead for specific workload groups.

6. `max_buckets`
Caps the number of aggregation buckets that a search can produce, protecting the cluster from excessive memory consumption caused by unbounded or poorly designed aggregations.

### Related component

Search:Resiliency

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] WLM group-level search settings #20555

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] WLM group-level search settings #20555

Description

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions