Skip to content

[BUG][Search Backpressure] High Heap Usage Cancellation Due to High Node-Level CPU Utilization #13295

@ticheng-aws

Description

@ticheng-aws

Describe the bug

With the current search backpressure cancellation logic, we've noticed that some high CPU usage search requests, such as multi-term aggregation, may result in more cancellations due to task-level heap usage settings. However, the system still has sufficient heap memory to process the tasks.

Related component

Search:Resiliency

To Reproduce

Use multi_term_agg in http_logs workload. It's often referred to as a high CPU usage search request.

  1. Setup a OpenSearch cluster and OpenSearch Benchmark client
  2. Run test with multi_term_agg operation in http_logs workload and gradually increase the search client using below sample command
opensearch-benchmark execute-test --pipeline=benchmark-only --client-options='basic_auth_user:<USER>,basic_auth_password:<PASSWORD>,timeout:300' --target-hosts '<END_POINT>:443' --kill-running-processes --workload=http_logs --workload-param='target_throughput:none,number_of_replicas:0,number_of_shards:1,search_clients:2'
  1. Monitor the CPU utilization and JVM memory pressure of your OpenSearch cluster
  2. Retrieve cancellation count with GET _nodes/stats/search_backpressure restful API

Expected behavior

We need to adjust the current search backpressure cancellation logic to cancel tasks based on measurements of node-level resources. For example, if a node is under duress due to high CPU utilization, we should only consider canceling tasks based on CPU settings, rather than heap or elapsed time settings at the task level.

Additional Details

Host/Environment (please complete the following information):

  • Version OS_1.3 +

Metadata

Metadata

Assignees

Labels

Search:PerformanceSearch:ResiliencybugSomething isn't workingenhancementEnhancement or improvement to existing feature or requestv2.15.0Issues and PRs related to version 2.15.0

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions