Skip to content

Multi-terms Aggregation Performance Optimization #13120

@sandeshkr419

Description

@sandeshkr419

Starting this thread to discuss ideas for optimizing multi-terms aggregation.

Sample query:

{
  "size": 0,
  "aggs": {
    "important_terms": {
      "multi_terms": {
        "terms": [
          {
            "field": "process.name"
          },
          {
            "field": "cloud.region"
          }
        ]
      }
    }
  }
}

Current flow overview:
For each document, increment the count of composite (formed using multiple fields) bucket.

Initial ideas for optimization:
Trying out to see if for certain scenarios, will it make sense to start the execution from the postings data instead. For example, taking into account the possible buckets and then finding intersection among different buckets to find intersection of documents. Finding doc intersections for different fields is something which we can experiment out to find if it makes any advantage than the current workflow in terms of performance.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

✅ Done

Status

Not In Plan

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions