diff --git a/_about/breaking-changes.md b/_about/breaking-changes.md index 0812e03216f..1070a5f04be 100644 --- a/_about/breaking-changes.md +++ b/_about/breaking-changes.md @@ -175,6 +175,16 @@ Nodes that use searchable snapshots must have the `warm` node role. Key changes For more information, see pull request [#17573](https://github.com/opensearch-project/OpenSearch/pull/17573). +### Query groups + +Query groups have been renamed to **workload groups**. Key changes include the following: + +- The `wlm/query_group` endpoint is now the `wlm/workload_group` endpoint. +- The API responds with a `workloadGroupID` instead of a `queryGroupID`. +- All workload management cluster settings are now prepended with `wlm.workload_group`. + +For more information, see pull request [#9813](https://github.com/opensearch-project/OpenSearch/pull/17901). + ### ML Commons plugin - The `CatIndexTool` is removed in favor of the `ListIndexTool`. diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/query-group-lifecycle-api.md b/_tuning-your-cluster/availability-and-recovery/workload-management/query-group-lifecycle-api.md deleted file mode 100644 index 2ed40d07056..00000000000 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/query-group-lifecycle-api.md +++ /dev/null @@ -1,155 +0,0 @@ ---- -layout: default -title: Query Group Lifecycle API -nav_order: 20 -parent: Workload management -grand_parent: Availability and recovery ---- - -# Query Group Lifecycle API - -The Query Group Lifecycle API creates, updates, retrieves, and deletes query groups. The API categorizes queries into specific groups, called _query groups_, based on desired resource limits. - -## Endpoints - - -### Create a query group - - -```json -PUT /_wlm/query_group -``` - - -### Update a query group - - -```json -PUT /_wlm/query_group -``` - - -### Get a query group - - -```json -GET /_wlm/query_group -GET /_wlm/query_group/{name} -``` - - -### Delete a query group - - -```json -PUT /_wlm/query_group -``` - - - -## Request body fields - -| Field | Description | -| :--- | :--- | -| `_id` | The ID of the query group, which can be used to associate query requests with the group and enforce the group's resource limits. | -| `name` | The name of the query group. | -| `resiliency_mode` | The resiliency mode of the query group. Valid modes are `enforced`, `soft`, and `monitor`. For more information about resiliency modes, see [Operating modes]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview/#operating-modes). | -| `resource_limits` | The resource limits for query requests in the query group. Valid resources are `cpu` and `memory`. | - -When creating a query group, make sure that the sum of the resource limits for a single resource, either `cpu` or `memory`, does not exceed 1. - -## Example requests - -The following example requests show how to use the Query Group Lifecycle API. - -### Create a query group - -```json -PUT _wlm/query_group -{ - "name": "analytics", - "resiliency_mode": "enforced", - "resource_limits": { - "cpu": 0.4, - "memory": 0.2 - } -} -``` -{% include copy-curl.html %} - -### Update a query group - -```json -PUT _wlm/query_group/analytics -{ - "resiliency_mode": "monitor", - "resource_limits": { - "cpu": 0.41, - "memory": 0.21 - } -} -``` -{% include copy-curl.html %} - - -## Example responses - -OpenSearch returns responses similar to the following. - -### Creating a query group - -```json -{ - "_id":"preXpc67RbKKeCyka72_Gw", - "name":"analytics", - "resiliency_mode":"enforced", - "resource_limits":{ - "cpu":0.4, - "memory":0.2 - }, - "updated_at":1726270184642 -} -``` - -### Updating a query group - -```json -{ - "_id":"preXpc67RbKKeCyka72_Gw", - "name":"analytics", - "resiliency_mode":"monitor", - "resource_limits":{ - "cpu":0.41, - "memory":0.21 - }, - "updated_at":1726270333804 -} -``` - -## Response body fields - -| Field | Description | -| :--- | :--- | -| `_id` | The ID of the query group. | -| `name` | The name of the query group. Required when creating a new query group. | -| `resiliency_mode` | The resiliency mode of the query group. | -| `resource_limits` | The resource limits of the query group. | -| `updated_at` | The time at which the query group was last updated. | - - diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index 956a01a7746..8d7968b9d41 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -26,30 +26,30 @@ To install workload management, use the following command: ``` {% include copy-curl.html %} -## Query groups +## Workload groups -A _query group_ is a logical grouping of tasks with defined resource limits. System administrators can dynamically manage query groups using the Workload Management APIs. These query groups can be used to create search requests with resource limits. +A _workload group_ is a logical grouping of tasks with defined resource limits. System administrators can dynamically manage workload groups using the Workload Management APIs. These workload groups can be used to create search requests with resource limits. ### Permissions -Only users with administrator-level permissions can create and update query groups using the Workload Management APIs. +Only users with administrator-level permissions can create and update workload groups using the Workload Management APIs. ### Operating modes -The following operating modes determine the operating level for a query group: +The following operating modes determine the operating level for a workload group: - **Disabled mode**: Workload management is disabled. -- **Enabled mode**: Workload management is enabled and will cancel and reject queries once the query group's configured thresholds are reached. +- **Enabled mode**: Workload management is enabled and will cancel and reject queries once the workload group's configured thresholds are reached. - **Monitor_only mode** (Default): Workload management will monitor tasks but will not cancel or reject any queries. ### Example request -The following example request adds a query group named `analytics`: +The following example request adds a workload group named `analytics`: ```json -PUT _wlm/query_group +PUT _wlm/workload_group { “name”: “analytics”, “resiliency_mode”: “enforced”, @@ -61,11 +61,11 @@ PUT _wlm/query_group ``` {% include copy-curl.html %} -When creating a query group, make sure that the sum of the resource limits for a single resource, such as `cpu` or `memory`, does not exceed `1`. +When creating a workload group, make sure that the sum of the resource limits for a single resource, such as `cpu` or `memory`, does not exceed `1`. ### Example response -OpenSearch responds with the set resource limits and the `_id` for the query group: +OpenSearch responds with the set resource limits and the `_id` for the workload group: ```json { @@ -80,17 +80,17 @@ OpenSearch responds with the set resource limits and the `_id` for the query gro } ``` -## Using `queryGroupID` +## Using `workloadGroupID` -You can associate a query request with a `queryGroupID` to manage and allocate resources within the limits defined by the query group. By using this ID, request routing and tracking are associated with the query group, ensuring resource quotas and task limits are maintained. +You can associate a query request with a `workloadGroupID` to manage and allocate resources within the limits defined by the workload group. By using this ID, request routing and tracking are associated with the workload group, ensuring resource quotas and task limits are maintained. -The following example query uses the `queryGroupId` to ensure that the query does not exceed that query group's resource limits: +The following example query uses the `workloadGroupID` to ensure that the query does not exceed that workload group's resource limits: ```json GET testindex/_search Host: localhost:9200 Content-Type: application/json -queryGroupId: preXpc67RbKKeCyka72_Gw +workloadGroupId: preXpc67RbKKeCyka72_Gw { "query": { "match": { @@ -105,21 +105,21 @@ queryGroupId: preXpc67RbKKeCyka72_Gw The following settings can be used to customize workload management using the `_cluster/settings` API. -| **Setting name** | **Description** | -| :--- | :--- | -| `wlm.query_group.duress_streak` | Determines the node duress threshold. Once the threshold is reached, the node is marked as `in duress`. | -| `wlm.query_group.enforcement_interval` | Defines the monitoring interval. | -| `wlm.query_group.mode` | Defines the [operating mode](#operating-modes). | -| `wlm.query_group.node.memory_rejection_threshold` | Defines the query group level `memory` threshold. When the threshold is reached, the request is rejected. | -| `wlm.query_group.node.cpu_rejection_threshold` | Defines the query group level `cpu` threshold. When the threshold is reached, the request is rejected. | -| `wlm.query_group.node.memory_cancellation_threshold` | Controls whether the node is considered to be in duress when the `memory` threshold is reached. Requests routed to nodes in duress are canceled. | -| `wlm.query_group.node.cpu_cancellation_threshold` | Controls whether the node is considered to be in duress when the `cpu` threshold is reached. Requests routed to nodes in duress are canceled. | +| **Setting name** | **Description** | +|:-----------------------------------------------------------| :--- | +| `wlm.workload_group.duress_streak` | Determines the node duress threshold. Once the threshold is reached, the node is marked as `in duress`. | +| `wlm.workload_group.enforcement_interval` | Defines the monitoring interval. | +| `wlm.workload_group.mode` | Defines the [operating mode](#operating-modes). | +| `wlm.workload_group.node.memory_rejection_threshold` | Defines the workload group level `memory` threshold. When the threshold is reached, the request is rejected. | +| `wlm.workload_group.node.cpu_rejection_threshold` | Defines the workload group level `cpu` threshold. When the threshold is reached, the request is rejected. | +| `wlm.workload_group.node.memory_cancellation_threshold` | Controls whether the node is considered to be in duress when the `memory` threshold is reached. Requests routed to nodes in duress are canceled. | +| `wlm.workload_group.node.cpu_cancellation_threshold` | Controls whether the node is considered to be in duress when the `cpu` threshold is reached. Requests routed to nodes in duress are canceled. | When setting rejection and cancellation thresholds, remember that the rejection threshold for a resource should always be lower than the cancellation threshold. ## Workload Management Stats API -The Workload Management Stats API returns workload management metrics for a query group, using the following method: +The Workload Management Stats API returns workload management metrics for a workload group, using the following method: ```json GET _wlm/stats @@ -137,7 +137,7 @@ GET _wlm/stats }, “cluster_name”: “XXXXXXYYYYYYYY”, “A3L9EfBIQf2anrrUhh_goA”: { - “query_groups”: { + “workload_groups”: { “16YGxFlPRdqIO7K4EACJlw”: { “total_completions”: 33570, “total_rejections”: 0, @@ -153,7 +153,7 @@ GET _wlm/stats “rejections”: 0 } }, - “DEFAULT_QUERY_GROUP”: { + “DEFAULT_WORKLOAD_GROUP”: { “total_completions”: 42572, “total_rejections”: 0, “total_cancellations”: 0, @@ -176,19 +176,19 @@ GET _wlm/stats ### Response body fields -| Field name | Description | -| :--- | :--- | -| `total_completions` | The total number of request completions in the `query_group` at the given node. This includes all shard-level and coordinator-level requests. | -| `total_rejections` | The total number request rejections in the `query_group` at the given node. This includes all shard-level and coordinator-level requests. | -| `total_cancellations` | The total number of cancellations in the `query_group` at the given node. This includes all shard-level and coordinator-level requests. | -| `cpu` | The `cpu` resource type statistics for the `query_group`. | -| `memory` | The `memory` resource type statistics for the `query_group`. | +| Field name | Description | +| :--- |:-------------------------------------------------------------------------------------------------------------------------------------------------| +| `total_completions` | The total number of request completions in the `workload_group` at the given node. This includes all shard-level and coordinator-level requests. | +| `total_rejections` | The total number request rejections in the `workload_group` at the given node. This includes all shard-level and coordinator-level requests. | +| `total_cancellations` | The total number of cancellations in the `workload_group` at the given node. This includes all shard-level and coordinator-level requests. | +| `cpu` | The `cpu` resource type statistics for the `workload_group`. | +| `memory` | The `memory` resource type statistics for the `workload_group`. | ### Resource type statistics -| Field name | Description | -| :--- | :---- | -| `current_usage` |The resource usage for the `query_group` at the given node based on the last run of the monitoring thread. This value is updated based on the `wlm.query_group.enforcement_interval`. | -| `cancellations` | The number of cancellations resulting from the cancellation threshold being reached. | -| `rejections` | The number of rejections resulting from the cancellation threshold being reached. | +| Field name | Description | +| :--- |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `current_usage` | The resource usage for the `workload_group` at the given node based on the last run of the monitoring thread. This value is updated based on the `wlm.workload_group.enforcement_interval`. | +| `cancellations` | The number of cancellations resulting from the cancellation threshold being reached. | +| `rejections` | The number of rejections resulting from the cancellation threshold being reached. | diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/workload-group-lifecycle-api.md b/_tuning-your-cluster/availability-and-recovery/workload-management/workload-group-lifecycle-api.md new file mode 100644 index 00000000000..901c06f9853 --- /dev/null +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/workload-group-lifecycle-api.md @@ -0,0 +1,155 @@ +--- +layout: default +title: Workload Group Lifecycle API +nav_order: 20 +parent: Workload management +grand_parent: Availability and recovery +--- + +# Workload Group Lifecycle API + +The Workload Group Lifecycle API creates, updates, retrieves, and deletes workload groups. The API categorizes queries into specific groups, called _workload groups_, based on desired resource limits. + +## Endpoints + + +### Create a workload group + + +```json +PUT /_wlm/workload_group +``` + + +### Update a workload group + + +```json +PUT /_wlm/workload_group +``` + + +### Get a workload group + + +```json +GET /_wlm/workload_group +GET /_wlm/workload_group/{name} +``` + + +### Delete a workload group + + +```json +PUT /_wlm/workload_group +``` + + + +## Request body fields + +| Field | Description | +| :--- | :--- | +| `_id` | The ID of the workload group, which can be used to associate query requests with the group and enforce the group's resource limits. | +| `name` | The name of the workload group. | +| `resiliency_mode` | The resiliency mode of the workload group. Valid modes are `enforced`, `soft`, and `monitor`. For more information about resiliency modes, see [Operating modes]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview/#operating-modes). | +| `resource_limits` | The resource limits for query requests in the workload group. Valid resources are `cpu` and `memory`. | + +When creating a workload group, make sure that the sum of the resource limits for a single resource, either `cpu` or `memory`, does not exceed 1. + +## Example requests + +The following example requests show how to use the Workload Group Lifecycle API. + +### Create a workload group + +```json +PUT _wlm/workload_group +{ + "name": "analytics", + "resiliency_mode": "enforced", + "resource_limits": { + "cpu": 0.4, + "memory": 0.2 + } +} +``` +{% include copy-curl.html %} + +### Update a workload group + +```json +PUT _wlm/workload_group/analytics +{ + "resiliency_mode": "monitor", + "resource_limits": { + "cpu": 0.41, + "memory": 0.21 + } +} +``` +{% include copy-curl.html %} + + +## Example responses + +OpenSearch returns responses similar to the following. + +### Creating a workload group + +```json +{ + "_id":"preXpc67RbKKeCyka72_Gw", + "name":"analytics", + "resiliency_mode":"enforced", + "resource_limits":{ + "cpu":0.4, + "memory":0.2 + }, + "updated_at":1726270184642 +} +``` + +### Updating a workload group + +```json +{ + "_id":"preXpc67RbKKeCyka72_Gw", + "name":"analytics", + "resiliency_mode":"monitor", + "resource_limits":{ + "cpu":0.41, + "memory":0.21 + }, + "updated_at":1726270333804 +} +``` + +## Response body fields + +| Field | Description | +| :--- | :--- | +| `_id` | The ID of the workload group. | +| `name` | The name of the workload group. Required when creating a new workload group. | +| `resiliency_mode` | The resiliency mode of the workload group. | +| `resource_limits` | The resource limits of the workload group. | +| `updated_at` | The time at which the workload group was last updated. | + +