-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Is your feature request related to a problem? Please describe
Current cardinality aggregator logic selects DirectCollector over OrdinalsCollector when relative memory overhead due to OrdinalsCollector (compared to DirectCollector) is higher. Because of this relative memory consumption logic, DirectCollector is selected for high cardinality aggregation queries. DirectCollector is slower compared to OrdinalsCollector. This default selection leads to higher search latency even when Opensearch process have available memory to use ordinals collector for faster query performance.
Describe the solution you'd like
Ideally, aggregator could be decided based on available memory vs required memory. If required memory is <x% of available memory, use OrdinalsCollector. As per my understanding, Opensearch does not have any metric on available heap after GC. Since, we do not have available memory, we can use total memory as proxy metric and select ordinals collector if required memory is x% of total memory.
Related component
Search:Aggregations
Describe alternatives you've considered
As an alternative solution, execution hint was added as input parameter where customer can pass hint to use Ordinals Collector. But this has two disadvantages
- execution hint needs to be decided by customer.
- SQL plugin queries does not have ability to pass such input
Another alternative solution was to always use Ordinals Collector. But that will not be feasible when number of buckets are very high. With higher bucket count (default max limit of 65k), number of buckets will be very high such that total required memory for query may exceed available memory.
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status