Refactor the filter rewrite optimization#14464
Merged
mch2 merged 24 commits intoopensearch-project:mainfrom Aug 9, 2024
Merged
Refactor the filter rewrite optimization#14464mch2 merged 24 commits intoopensearch-project:mainfrom
mch2 merged 24 commits intoopensearch-project:mainfrom
Conversation
Contributor
|
❌ Gradle check result for 1a067ba: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Contributor
9040f6f to
e896927
Compare
Contributor
mch2
reviewed
Aug 7, 2024
...rg/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteOptimizationContext.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/opensearch/search/aggregations/bucket/filterrewrite/AggregatorBridge.java
Show resolved
Hide resolved
...rg/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteOptimizationContext.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java
Show resolved
Hide resolved
Contributor
- remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Contributor
Contributor
|
❕ Gradle check result for 86cacab: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
mch2
approved these changes
Aug 9, 2024
Member
mch2
left a comment
There was a problem hiding this comment.
Thanks for these changes @bowenlan-amzn I think this is much easier to follow than the original helper class. I think we can keep going with some cleanup but my major concern re concurrent search appears resolved.
.../src/main/java/org/opensearch/search/aggregations/bucket/filterrewrite/AggregatorBridge.java
Show resolved
Hide resolved
opensearch-trigger-bot bot
pushed a commit
that referenced
this pull request
Aug 9, 2024
* Refactor Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor extract segment match all logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor inline class Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Fix a bug Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * prepareFromSegment now doesn't return Ranges Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * how it looks like when introduce interfaces Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove interface, clean up Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improve doc Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * move multirangetraversal logic to helper Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improve the refactor package name -> filterrewrite move tree traversal logic to new class add documentation for important abstract methods add sub class for composite aggregation bridge Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address Marc's comments Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address concurrent segment search concern To save the ranges per segment, now change to a map that save ranges for segments separately. The increment document function "incrementBucketDocCount" should already be thread safe, as it's the same method used by normal aggregation execution path Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove circular dependency Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address comment - remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> (cherry picked from commit 170ea27) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
3 tasks
Member
|
@mch2 @bowenlan-amzn We shouldn't skip changelog for these changes. |
mch2
pushed a commit
that referenced
this pull request
Aug 19, 2024
* Refactor Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general * Refactor refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic * Refactor refactor the data provider and try optimize logic * Refactor * Refactor extract segment match all logic * Refactor * Refactor inline class * Fix a bug * address comment * prepareFromSegment now doesn't return Ranges * how it looks like when introduce interfaces * remove interface, clean up * improve doc * move multirangetraversal logic to helper * improve the refactor package name -> filterrewrite move tree traversal logic to new class add documentation for important abstract methods add sub class for composite aggregation bridge * Address Marc's comments * Address concurrent segment search concern To save the ranges per segment, now change to a map that save ranges for segments separately. The increment document function "incrementBucketDocCount" should already be thread safe, as it's the same method used by normal aggregation execution path * remove circular dependency * Address comment - remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info --------- (cherry picked from commit 170ea27) Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
harshavamsi
pushed a commit
to harshavamsi/OpenSearch
that referenced
this pull request
Aug 20, 2024
* Refactor Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor extract segment match all logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor inline class Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Fix a bug Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * prepareFromSegment now doesn't return Ranges Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * how it looks like when introduce interfaces Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove interface, clean up Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improve doc Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * move multirangetraversal logic to helper Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improve the refactor package name -> filterrewrite move tree traversal logic to new class add documentation for important abstract methods add sub class for composite aggregation bridge Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address Marc's comments Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address concurrent segment search concern To save the ranges per segment, now change to a map that save ranges for segments separately. The increment document function "incrementBucketDocCount" should already be thread safe, as it's the same method used by normal aggregation execution path Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove circular dependency Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address comment - remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
wdongyu
pushed a commit
to wdongyu/OpenSearch
that referenced
this pull request
Aug 22, 2024
* Refactor Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor extract segment match all logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor inline class Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Fix a bug Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * prepareFromSegment now doesn't return Ranges Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * how it looks like when introduce interfaces Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove interface, clean up Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improve doc Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * move multirangetraversal logic to helper Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improve the refactor package name -> filterrewrite move tree traversal logic to new class add documentation for important abstract methods add sub class for composite aggregation bridge Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address Marc's comments Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address concurrent segment search concern To save the ranges per segment, now change to a map that save ranges for segments separately. The increment document function "incrementBucketDocCount" should already be thread safe, as it's the same method used by normal aggregation execution path Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove circular dependency Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address comment - remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
akolarkunnu
pushed a commit
to akolarkunnu/OpenSearch
that referenced
this pull request
Sep 10, 2024
* Refactor Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor extract segment match all logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor inline class Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Fix a bug Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * prepareFromSegment now doesn't return Ranges Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * how it looks like when introduce interfaces Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove interface, clean up Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improve doc Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * move multirangetraversal logic to helper Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improve the refactor package name -> filterrewrite move tree traversal logic to new class add documentation for important abstract methods add sub class for composite aggregation bridge Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address Marc's comments Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address concurrent segment search concern To save the ranges per segment, now change to a map that save ranges for segments separately. The increment document function "incrementBucketDocCount" should already be thread safe, as it's the same method used by normal aggregation execution path Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove circular dependency Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address comment - remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
This was referenced Sep 11, 2024
Open
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
As more code coming into the filter rewrite optimization, it starts to become harder to understand.
Not only making the code review slower and painful, it also will slow down the new contributors into this area. So here comes the refactoring work.
Idea
The refactoring shouldn't change any business logic.
After the refactor, reader can easily find all the important information by just reading the class doc and checking the public methods of all classes.
Refactoring
Why the name —
filter rewrite optimization?Filter in OpenSearch world has similar meaning as query, while it indicates no relavance scoring calculated.
Rewrite in OpenSearch world can mean transform OpenSearch query into lucene query, or transform a query to perform better.
Generally speaking, the optimization rewrites the aggregation into certain filters to improve performance. Aggregation execution is plain and simple iteration and collection on all matches, while filters can take advantage of the Lucene index to get expected results in log or even constant time.
Benchmark
Using the new tool to trigger benchmark from PR #14464 (comment)
Related Issues
Resolves #14435
Check List
[ ] Functionality includes testing.[ ] API changes companion pull request created, if applicable.[ ] Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.