Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions _about/breaking-changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,16 @@ Nodes that use searchable snapshots must have the `warm` node role. Key changes

For more information, see pull request [#17573](https://github.com/opensearch-project/OpenSearch/pull/17573).

### Query groups

Query groups have been renamed to **workload groups**. Key changes include the following:

- The `wlm/query_group` endpoint is now the `wlm/workload_group` endpoint.
- The API responds with a `workloadGroupID` instead of a `queryGroupID`.
- All workload management cluster settings are now prepended with `wlm.workload_group`.

For more information, see pull request [#9813](https://github.com/opensearch-project/OpenSearch/pull/17901).

### ML Commons plugin

- The `CatIndexTool` is removed in favor of the `ListIndexTool`.
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -26,30 +26,30 @@
```
{% include copy-curl.html %}

## Query groups
## Workload groups

A _query group_ is a logical grouping of tasks with defined resource limits. System administrators can dynamically manage query groups using the Workload Management APIs. These query groups can be used to create search requests with resource limits.
A _workload group_ is a logical grouping of tasks with defined resource limits. System administrators can dynamically manage workload groups using the Workload Management APIs. These workload groups can be used to create search requests with resource limits.

### Permissions

Only users with administrator-level permissions can create and update query groups using the Workload Management APIs.
Only users with administrator-level permissions can create and update workload groups using the Workload Management APIs.

### Operating modes

The following operating modes determine the operating level for a query group:
The following operating modes determine the operating level for a workload group:

- **Disabled mode**: Workload management is disabled.

- **Enabled mode**: Workload management is enabled and will cancel and reject queries once the query group's configured thresholds are reached.
- **Enabled mode**: Workload management is enabled and will cancel and reject queries once the workload group's configured thresholds are reached.

- **Monitor_only mode** (Default): Workload management will monitor tasks but will not cancel or reject any queries.

### Example request

The following example request adds a query group named `analytics`:
The following example request adds a workload group named `analytics`:

```json
PUT _wlm/query_group
PUT _wlm/workload_group
{
“name”: “analytics”,
“resiliency_mode”: “enforced”,
Expand All @@ -61,11 +61,11 @@
```
{% include copy-curl.html %}

When creating a query group, make sure that the sum of the resource limits for a single resource, such as `cpu` or `memory`, does not exceed `1`.
When creating a workload group, make sure that the sum of the resource limits for a single resource, such as `cpu` or `memory`, does not exceed `1`.

### Example response

OpenSearch responds with the set resource limits and the `_id` for the query group:
OpenSearch responds with the set resource limits and the `_id` for the workload group:

```json
{
Expand All @@ -80,17 +80,17 @@
}
```

## Using `queryGroupID`
## Using `workloadGroupID`

You can associate a query request with a `queryGroupID` to manage and allocate resources within the limits defined by the query group. By using this ID, request routing and tracking are associated with the query group, ensuring resource quotas and task limits are maintained.
You can associate a query request with a `workloadGroupID` to manage and allocate resources within the limits defined by the workload group. By using this ID, request routing and tracking are associated with the workload group, ensuring resource quotas and task limits are maintained.

The following example query uses the `queryGroupId` to ensure that the query does not exceed that query group's resource limits:
The following example query uses the `workloadGroupID` to ensure that the query does not exceed that workload group's resource limits:

```json
GET testindex/_search
Host: localhost:9200
Content-Type: application/json
queryGroupId: preXpc67RbKKeCyka72_Gw
workloadGroupId: preXpc67RbKKeCyka72_Gw
{
"query": {
"match": {
Expand All @@ -105,21 +105,21 @@

The following settings can be used to customize workload management using the `_cluster/settings` API.

| **Setting name** | **Description** |
| :--- | :--- |
| `wlm.query_group.duress_streak` | Determines the node duress threshold. Once the threshold is reached, the node is marked as `in duress`. |
| `wlm.query_group.enforcement_interval` | Defines the monitoring interval. |
| `wlm.query_group.mode` | Defines the [operating mode](#operating-modes). |
| `wlm.query_group.node.memory_rejection_threshold` | Defines the query group level `memory` threshold. When the threshold is reached, the request is rejected. |
| `wlm.query_group.node.cpu_rejection_threshold` | Defines the query group level `cpu` threshold. When the threshold is reached, the request is rejected. |
| `wlm.query_group.node.memory_cancellation_threshold` | Controls whether the node is considered to be in duress when the `memory` threshold is reached. Requests routed to nodes in duress are canceled. |
| `wlm.query_group.node.cpu_cancellation_threshold` | Controls whether the node is considered to be in duress when the `cpu` threshold is reached. Requests routed to nodes in duress are canceled. |
| **Setting name** | **Description** |
|:-----------------------------------------------------------| :--- |
| `wlm.workload_group.duress_streak` | Determines the node duress threshold. Once the threshold is reached, the node is marked as `in duress`. |
| `wlm.workload_group.enforcement_interval` | Defines the monitoring interval. |
| `wlm.workload_group.mode` | Defines the [operating mode](#operating-modes). |
| `wlm.workload_group.node.memory_rejection_threshold` | Defines the workload group level `memory` threshold. When the threshold is reached, the request is rejected. |
| `wlm.workload_group.node.cpu_rejection_threshold` | Defines the workload group level `cpu` threshold. When the threshold is reached, the request is rejected. |
| `wlm.workload_group.node.memory_cancellation_threshold` | Controls whether the node is considered to be in duress when the `memory` threshold is reached. Requests routed to nodes in duress are canceled. |
| `wlm.workload_group.node.cpu_cancellation_threshold` | Controls whether the node is considered to be in duress when the `cpu` threshold is reached. Requests routed to nodes in duress are canceled. |

When setting rejection and cancellation thresholds, remember that the rejection threshold for a resource should always be lower than the cancellation threshold.

## Workload Management Stats API

The Workload Management Stats API returns workload management metrics for a query group, using the following method:
The Workload Management Stats API returns workload management metrics for a workload group, using the following method:

```json
GET _wlm/stats
Expand All @@ -137,7 +137,7 @@
},
“cluster_name”: “XXXXXXYYYYYYYY”,
“A3L9EfBIQf2anrrUhh_goA”: {
query_groups”: {
workload_groups”: {
“16YGxFlPRdqIO7K4EACJlw”: {
“total_completions”: 33570,
“total_rejections”: 0,
Expand All @@ -153,7 +153,7 @@
“rejections”: 0
}
},
DEFAULT_QUERY_GROUP”: {
DEFAULT_WORKLOAD_GROUP”: {
“total_completions”: 42572,
“total_rejections”: 0,
“total_cancellations”: 0,
Expand All @@ -176,19 +176,19 @@

### Response body fields

| Field name | Description |
| :--- | :--- |
| `total_completions` | The total number of request completions in the `query_group` at the given node. This includes all shard-level and coordinator-level requests. |
| `total_rejections` | The total number request rejections in the `query_group` at the given node. This includes all shard-level and coordinator-level requests. |
| `total_cancellations` | The total number of cancellations in the `query_group` at the given node. This includes all shard-level and coordinator-level requests. |
| `cpu` | The `cpu` resource type statistics for the `query_group`. |
| `memory` | The `memory` resource type statistics for the `query_group`. |
| Field name | Description |
| :--- |:-------------------------------------------------------------------------------------------------------------------------------------------------|
| `total_completions` | The total number of request completions in the `workload_group` at the given node. This includes all shard-level and coordinator-level requests. |

Check failure on line 181 in _tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: ato. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: ato. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md", "range": {"start": {"line": 181, "column": 150}}}, "severity": "ERROR"}
| `total_rejections` | The total number request rejections in the `workload_group` at the given node. This includes all shard-level and coordinator-level requests. |
| `total_cancellations` | The total number of cancellations in the `workload_group` at the given node. This includes all shard-level and coordinator-level requests. |
| `cpu` | The `cpu` resource type statistics for the `workload_group`. |
| `memory` | The `memory` resource type statistics for the `workload_group`. |

### Resource type statistics

| Field name | Description |
| :--- | :---- |
| `current_usage` |The resource usage for the `query_group` at the given node based on the last run of the monitoring thread. This value is updated based on the `wlm.query_group.enforcement_interval`. |
| `cancellations` | The number of cancellations resulting from the cancellation threshold being reached. |
| `rejections` | The number of rejections resulting from the cancellation threshold being reached. |
| Field name | Description |
| :--- |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `current_usage` | The resource usage for the `workload_group` at the given node based on the last run of the monitoring thread. This value is updated based on the `wlm.workload_group.enforcement_interval`. |
| `cancellations` | The number of cancellations resulting from the cancellation threshold being reached. |
| `rejections` | The number of rejections resulting from the cancellation threshold being reached. |

Loading
Loading