Apply the date histogram rewrite optimization to range aggregation#13865
Apply the date histogram rewrite optimization to range aggregation#13865mch2 merged 42 commits intoopensearch-project:mainfrom
Conversation
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
|
❌ Gradle check result for c5d2175: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for ed79e02: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
6d41421 to
8ec5f58
Compare
8ec5f58 to
67c281c
Compare
|
❌ Gradle check result for 8ec5f58: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 67c281c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
|
❌ Gradle check result for c10c775: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 783b14a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 783b14a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for cb8cbbf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
@github-actions commented on Jun 18, 2024, 3:53 PM PDT:
|
|
❌ Gradle check result for 4b68d60: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 07a5293: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
@github-actions commented on Jun 18, 2024, 11:17 PM PDT:
|
|
@bowenlan-amzn Maybe a test failure here related to this change: https://build.ci.opensearch.org/job/gradle-check/41411/consoleText That will reproduce for me, though not every time |
Description
Background
Previously we optimized date histogram aggregation by utilizing the index structure of date field — BKD tree/Points to provide the documents count of date histogram buckets results. We first build date ranges from the date histogram query input like daily interval histogram. Then we "query" the index of date for how many documents are in these date ranges.
Idea
Date is actually saved as numeric data of long data type. This leads to the idea that we can extend the optimization to the RangeAggregator which also perform ranges aggregation on numeric data.
Implementation
The core optimization algorithm remains the same. (Details #13317)
Note the algorithm has a hidden pre-condition: all the ranges are not overlapping (because date interval cannot produce overlapping date ranges)
Another difference is that range aggregation buildRange method won't be built from segment level match all path. The ranges are provided by user directly, not like date histogram needs to check the boundaries of shard or segment or accommodate top level range query. (Details #12073)
Changes
long[]Possible Follow Ups
Benchmarks
However, considering this is already ~10ms. We can look into the reason later as a follow up. For the other workloads, big5, nyc, http, they are all even faster.
big5
noaa
Related Issues
Resolves #13531
Check List
New functionality has been documented.API changes companion pull request created.Public documentation issue/PR createdBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.