Refactor streaming agg query phase planning#20471
Refactor streaming agg query phase planning#20471rishabhmaurya merged 19 commits intoopensearch-project:mainfrom
Conversation
📝 WalkthroughWalkthroughThis PR refactors the streaming aggregation cost estimation architecture from a runtime collector-tree evaluation model to an early-stage aggregation-factory estimation model, introducing a new Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
❌ Gradle check result for bb004fb: ABORTED Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
bb004fb to
012735b
Compare
|
❌ Gradle check result for 012735b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
012735b to
13f2252
Compare
|
❌ Gradle check result for 13f2252: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
13f2252 to
ddf0f03
Compare
|
❌ Gradle check result for 889a7c0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
889a7c0 to
e03943d
Compare
…stant across segments, not per aggregator Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
|
❌ Gradle check result for b217ab9: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…ator Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
|
❌ Gradle check result for 6d8c16c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
- segmentTopN for term agg Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
- rest layer filter streaming Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
|
❌ Gradle check result for 597b0cc: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
|
❌ Gradle check result for 213a6ab: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for ab931dd: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
60c253d to
6ddb9e1
Compare
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
server/src/main/java/org/opensearch/search/streaming/FlushModeResolver.java
Outdated
Show resolved
Hide resolved
|
❌ Gradle check result for dbe69b6: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #20471 +/- ##
============================================
+ Coverage 73.19% 73.35% +0.15%
- Complexity 71975 72112 +137
============================================
Files 5796 5795 -1
Lines 329539 329462 -77
Branches 47465 47458 -7
============================================
+ Hits 241220 241675 +455
+ Misses 69005 68452 -553
- Partials 19314 19335 +21 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
* Refactor streaming agg query phase planning Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor null out from estimation recursive loop Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor out the segment size from streaming cost metric, as it's constant across segments, not per aggregator Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Consolidate duplicate ordinals estimation logic in StreamingCostEstimator Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * - only depend on topN to decide streaming - segmentTopN for term agg Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * - refactor metrics to only provide topN - rest layer filter streaming Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * fix card IT Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * segmentTopN Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add change log Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor segment size setting Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove not supported metric, avg, valuecount, looking into sum for now, if not supported, will remove Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * support sum with integration test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * fix test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor sub agg it Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * only allow 2 terms nested, not more Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment, comment out flaky test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
* Refactor streaming agg query phase planning Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor null out from estimation recursive loop Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor out the segment size from streaming cost metric, as it's constant across segments, not per aggregator Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Consolidate duplicate ordinals estimation logic in StreamingCostEstimator Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * - only depend on topN to decide streaming - segmentTopN for term agg Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * - refactor metrics to only provide topN - rest layer filter streaming Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * fix card IT Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * segmentTopN Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add change log Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor segment size setting Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove not supported metric, avg, valuecount, looking into sum for now, if not supported, will remove Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * support sum with integration test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * fix test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor sub agg it Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * only allow 2 terms nested, not more Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment, comment out flaky test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Description
Streaming aggregation feature supports an agg defined with
If multiple aggs fit above constraints, it should also be good.
Segment streaming topN heuristic set to
shard_sizeAlso user has a index level setting to provide the segment topN size.
The final topN is the larger one of these 2
The topN is compared with
search.aggregations.streaming.max_estimated_bucket_countto decide whether streaming should be used or not. If exceed, streaming is considered to stream back too many buckets per segment so we will fall back to default aggregation path which send response per shard.Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.