Fix cardinality agg pruning optimization by self collecting#19473
Fix cardinality agg pruning optimization by self collecting#19473jainankitk merged 2 commits intoopensearch-project:mainfrom
Conversation
* changes in DenseConjunctionBulkCollector is matching too many documents due to scoreWindow logic * will raise an issue in Lucene to track this * for now collect ourself * this optimization only runs on card agg without sub, and only within the pruning optimization * worst case we can disable this optimization via index setting Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
|
Tested on nyc_taxis |
|
@asimmahmood1 looks like a good fix. Tbh, i don't mind keeping this logic long term here as dynamic pruning collection logic shouldn't care about any lucene optimizations in future. Should we still create a lucene issue for possible regression if it is applicable in other queries? |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19473 +/- ##
============================================
- Coverage 73.03% 73.02% -0.01%
+ Complexity 70152 70087 -65
============================================
Files 5683 5683
Lines 321533 321547 +14
Branches 46503 46508 +5
============================================
- Hits 234821 234800 -21
Misses 67790 67790
- Partials 18922 18957 +35 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
{"run-benchmark-test": "id_1"} |
|
{"run-benchmark-test": "id_3"} |
|
The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/4563/ . Final results will be published once the job is completed. |
|
The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/4564/ . Final results will be published once the job is completed. |
Benchmark ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/4564/
|
Benchmark Baseline Comparison ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/165/
|
Benchmark ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/4563/
|
Benchmark Baseline Comparison ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/166/
|
|
Benchmark confirms the fix:
|
…ch-project#19473) Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
+1 Not the first time this optimization gets affected by Lucene change. It's better to collect in this way, directly reuse the related logic from DefaultBulkScorer. |
…ch-project#19473) Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Description
Related Issues
Fixes [#19367]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.