Skip to content

Apply the date histogram rewrite optimization to range aggregation#13865

Merged
mch2 merged 42 commits intoopensearch-project:mainfrom
bowenlan-amzn:13531-range-agg
Jun 19, 2024
Merged

Apply the date histogram rewrite optimization to range aggregation#13865
mch2 merged 42 commits intoopensearch-project:mainfrom
bowenlan-amzn:13531-range-agg

Conversation

@bowenlan-amzn
Copy link
Copy Markdown
Member

@bowenlan-amzn bowenlan-amzn commented May 29, 2024

Description

Background

Previously we optimized date histogram aggregation by utilizing the index structure of date field — BKD tree/Points to provide the documents count of date histogram buckets results. We first build date ranges from the date histogram query input like daily interval histogram. Then we "query" the index of date for how many documents are in these date ranges.

Idea

Date is actually saved as numeric data of long data type. This leads to the idea that we can extend the optimization to the RangeAggregator which also perform ranges aggregation on numeric data.

Implementation

The core optimization algorithm remains the same. (Details #13317)
Note the algorithm has a hidden pre-condition: all the ranges are not overlapping (because date interval cannot produce overlapping date ranges)
Another difference is that range aggregation buildRange method won't be built from segment level match all path. The ranges are provided by user directly, not like date histogram needs to check the boundaries of shard or segment or accommodate top level range query. (Details #12073)

Changes
  • Covering all numeric types
  • buildRanges method now return a generic Ranges type instead of long[]
  • The ranges built now will be same as what provided by customer, meaning it's lower inclusive, upper exclusive. The multi range traversal logic is refactored to fit that. The refactor also include some readability improvements.
  • tryFastFilterAggregation common logic is extracted, and AggregationType should implement its own special logic.
Possible Follow Ups
  • Geo distance range query
  • Overlapping ranges
  • We notice scaled float doesn't seem to belong in the module, as it's not experimental anymore. So better to just move it along with NumberType in the server.
  • Currently the rewrite helper class is becoming huge, also the workflow to do the optimization (isRewriteable, buildRanges, tryxxxRewrite...) needs refactor. Tracking task

Benchmarks

|                                                        Metric |                          Task |    Baseline |   Contender |     Diff |   Unit |
|                                       50th percentile latency | articles_monthly_agg_uncached |      7.2067 |     9.99465 |  2.78795 |     ms |
|                                       90th percentile latency | articles_monthly_agg_uncached |      7.9746 |     11.4022 |  3.42756 |     ms |
|                                       99th percentile latency | articles_monthly_agg_uncached |      12.822 |     16.0743 |  3.25229 |     ms |
|                                      100th percentile latency | articles_monthly_agg_uncached |     24.4897 |      36.373 |  11.8833 |     ms |

However, considering this is already ~10ms. We can look into the reason later as a follow up. For the other workloads, big5, nyc, http, they are all even faster.

  • Performance gain on the range aggregation.
    big5
|                                                        Metric |                          Task |    Baseline |   Contender |     Diff |   Unit |
|                                       50th percentile latency | range-agg-1 | 1.23376e+06 |     4.74459 | -1.23376e+06 |     ms |
|                                       90th percentile latency | range-agg-1 | 1.42868e+06 |     5.14212 | -1.42867e+06 |     ms |
|                                       99th percentile latency | range-agg-1 | 1.47254e+06 |     5.32877 | -1.47253e+06 |     ms |
|                                      100th percentile latency | range-agg-1 | 1.47741e+06 |     5.58408 |  -1.4774e+06 |     ms |
|                                  50th percentile service time | range-agg-1 |     5420.99 |     3.49232 |      -5417.5 |     ms |
|                                  90th percentile service time | range-agg-1 |     5433.47 |     3.63754 |     -5429.83 |     ms |
|                                  99th percentile service time | range-agg-1 |     5440.81 |     3.73967 |     -5437.07 |     ms |
|                                 100th percentile service time | range-agg-1 |     5443.22 |     4.11546 |      -5439.1 |     ms |
|                                                    error rate | range-agg-1 |           0 |           0 |            0 |      % |

|                                       50th percentile latency | range-agg-2 | 1.20783e+06 |     6.01016 | -1.20782e+06 |     ms |
|                                       90th percentile latency | range-agg-2 | 1.39873e+06 |     6.42305 | -1.39872e+06 |     ms |
|                                       99th percentile latency | range-agg-2 | 1.44164e+06 |     6.69281 | -1.44163e+06 |     ms |
|                                      100th percentile latency | range-agg-2 | 1.44642e+06 |     6.69403 | -1.44641e+06 |     ms |
|                                  50th percentile service time | range-agg-2 |     5317.48 |     4.86202 |     -5312.62 |     ms |
|                                  90th percentile service time | range-agg-2 |     5333.16 |     4.94407 |     -5328.21 |     ms |
|                                  99th percentile service time | range-agg-2 |     5367.61 |     5.09159 |     -5362.52 |     ms |
|                                 100th percentile service time | range-agg-2 |     5417.87 |     5.11181 |     -5412.76 |     ms |
|                                                    error rate | range-agg-2 |           0 |           0 |            0 |      % |

noaa

|                                       50th percentile latency | range-aggregation |     1484.37 |     10.8027 | -1473.56 |     ms |
|                                       90th percentile latency | range-aggregation |     1489.36 |     11.3409 | -1478.02 |     ms |
|                                      100th percentile latency | range-aggregation |     1493.02 |     14.9022 | -1478.12 |     ms |
|                                  50th percentile service time | range-aggregation |     1481.79 |     6.98101 | -1474.81 |     ms |
|                                  90th percentile service time | range-aggregation |     1486.58 |     7.40678 | -1479.17 |     ms |
|                                 100th percentile service time | range-aggregation |     1490.47 |      10.768 |  -1479.7 |     ms |
|                                                    error rate | range-aggregation |           0 |           0 |        0 |      % |

Related Issues

Resolves #13531

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • API changes companion pull request created.
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
@github-actions github-actions bot added Search:Aggregations v2.15.0 Issues and PRs related to version 2.15.0 labels May 29, 2024
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
@bowenlan-amzn bowenlan-amzn added the backport 2.x Backport to 2.x branch label May 29, 2024
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for c5d2175: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for ed79e02: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
support all numeric types
minus one on the upper range

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 8ec5f58: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 67c281c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for c10c775: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jun 2, 2024

❌ Gradle check result for 783b14a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jun 3, 2024

❌ Gradle check result for 783b14a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jun 3, 2024

✅ Gradle check result for 783b14a: SUCCESS

@bowenlan-amzn bowenlan-amzn marked this pull request as ready for review June 3, 2024 18:38
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for cb8cbbf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@bowenlan-amzn
Copy link
Copy Markdown
Member Author

@github-actions commented on Jun 18, 2024, 3:53 PM PDT:

❌ Gradle check result for cb8cbbf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Originally posted by @github-actions[bot] in #13865 (comment)

#9589 (comment)

#13828

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 4b68d60: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 07a5293: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Copy Markdown
Contributor

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bowenlan-amzn
Copy link
Copy Markdown
Member Author

@github-actions commented on Jun 18, 2024, 11:17 PM PDT:

❌ Gradle check result for 07a5293: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Originally posted by @github-actions[bot] in #13865 (comment)

#14292

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 9764b23: SUCCESS

@andrross
Copy link
Copy Markdown
Member

@bowenlan-amzn Maybe a test failure here related to this change: https://build.ci.opensearch.org/job/gradle-check/41411/consoleText

./gradlew ':rest-api-spec:yamlRestTest' --tests "org.opensearch.test.rest.ClientYamlTestSuiteIT" -Dtests.method="test {p0=search.aggregation/40_range/Double range profiler shows filter rewrite info}" -Dtests.seed=2F3C46DA8CEB59E9 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=hu-HU -Dtests.timezone=America/Montreal -Druntime.java=21 

That will reproduce for me, though not every time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.x Backport to 2.x branch Search:Aggregations v2.16.0 Issues and PRs related to version 2.16.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Apply the fast filter optimization to range aggregation

5 participants