Adds search implementation of context aware segments#19558
Adds search implementation of context aware segments#19558shatejas wants to merge 9 commits intoopensearch-project:mainfrom
Conversation
|
❌ Gradle check result for 0ece83d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
0ece83d to
e5db2d9
Compare
|
❌ Gradle check result for e5db2d9: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
e5db2d9 to
9b887d8
Compare
|
❌ Gradle check result for 9b887d8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| IndexShard shard = indexService.getShard(request.shardId().id()); | ||
| Engine.SearcherSupplier reader = shard.acquireSearcherSupplier(); | ||
| Set<String> groupingCriterias = Collections.emptySet(); | ||
| if (request.source() != null) { |
There was a problem hiding this comment.
If I understand correctly, this will make criteria extraction happen at every data node.
Since it's only dependent on the QueryBuilder we can compute this once at the Coordinator and pass to every transport search task.
There was a problem hiding this comment.
@expani Thats right it happens on every data node. Can you point me to the right code pointer so I can explore this to compute on coordinator node?
There was a problem hiding this comment.
https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/SearchService.java#L1519 Maybe during parseSource which will happen at Coordinator
There was a problem hiding this comment.
parseSource also executes on data node.
I looked around and there doesn't seem to be a good place to execute on coordinator - From what I understand, iterates through the shards and executes the search on each shard, there are no operations on query builder that happen on co-ordinator in current state
| * | ||
| * @opensearch.internal | ||
| */ | ||
| public interface WithFilterQueryBuilder { |
There was a problem hiding this comment.
I don't see this being instantiated anywhere, Can you elaborate on when this will be used ?
There was a problem hiding this comment.
There are plugin queries, like knn query, which can have filters, If a plugin query implements this interface - it will be able to leverage pruning in context aware cases
There was a problem hiding this comment.
Added documentation around this
There was a problem hiding this comment.
The name is a little misleading, consider CriteriaAwareFilterQueryBuilder/FilterAwareQueryBuilder. Also wondering should it also bind with a filter criteria or you want to keep this generic if this finds common use
There was a problem hiding this comment.
Also wondering should it also bind with a filter criteria or you want to keep this generic if this finds common use
The intent was to keep it generic, but let me know if you wouldn't want it generic
The name is a little misleading, consider CriteriaAwareFilterQueryBuilder/FilterAwareQueryBuilder
Will go with FilterAwareQueryBuilder
| * {@link org.opensearch.indices.IndicesRequestCache} depends on cache key generated in OpenSearchDirectoryReader | ||
| * This cache is used to be able to keep the cacheKey of OpensearchDirectoryReader consistent in cases of multiple criteria | ||
| */ | ||
| private Cache<Set<String>, OpenSearchDirectoryReader> criteriaCache; |
There was a problem hiding this comment.
Based on the extractCriteria implementation, the key Set<String> will be ordered based on how they appear in the query if there are multiple criterias ( BooleanQuery )
I think we need to preserve the order for the cache lookup to work if query is same but order of criteria fields are different.
There was a problem hiding this comment.
Sure I will sort the criterias in extractCriteria
| // Do not do this warm up if its CriteriaBasedReader as criteriaBasedReaders are already formed | ||
| if (!(in instanceof CriteriaBasedReader)) { | ||
| warmUpCriteriaBasedReader(in.leaves()); | ||
| criteriaCache = CacheBuilder.<Set<String>, OpenSearchDirectoryReader>builder().setMaximumWeight(1000).build(); |
There was a problem hiding this comment.
nit: Cluster setting index.context_aware.search.criteria_cache is not used here
There was a problem hiding this comment.
Good catch, will add it
There was a problem hiding this comment.
I looked around an now I remember that there is no easy way of being able to extract IndexSettings in reader and thats the reason for hard coding the value. I will find a good way to pass the value
9b887d8 to
b6eb84e
Compare
|
❌ Gradle check result for b6eb84e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
b6eb84e to
74ac3dc
Compare
|
❌ Gradle check result for 74ac3dc: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
74ac3dc to
ab40f31
Compare
|
❌ Gradle check result for ab40f31: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
ab40f31 to
71b42b4
Compare
|
❌ Gradle check result for 71b42b4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
71b42b4 to
2b4cb4c
Compare
|
❌ Gradle check result for 2b4cb4c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| public SearcherSupplier acquireSearcherSupplier( | ||
| Function<Searcher, Searcher> wrapper, | ||
| SearcherScope scope, | ||
| @Nullable Set<String> criteria |
There was a problem hiding this comment.
nit: Use Optional instead?
|
|
||
| private OpenSearchDirectoryReader createChildDirectoryReader(final List<LeafReader> leafReaders, Set<String> criteria) | ||
| throws IOException { | ||
| final String uniqCacheKey = UUID.nameUUIDFromBytes(String.join(",", criteria).getBytes(StandardCharsets.UTF_8)).toString(); |
There was a problem hiding this comment.
What happens if we have multiple criterion can the set have out of order but same criteria causing multiple keys(unless you are using a SortedSet)
There was a problem hiding this comment.
sorting it before building a cachekey
3612534 to
cf2c41b
Compare
|
❌ Gradle check result for cf2c41b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for bf66db7: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
bf66db7 to
d0af31d
Compare
|
❌ Gradle check result for 20e445d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
The implementation aims at pruning search space based on context awareness when context-aware-grouping mapper is presents. It does a best attempt extraction of grouping criteria from the query, if no grouping criteria is found it will search all segements Signed-off-by: Tejas Shah <shatejas@amazon.com>
Signed-off-by: Tejas Shah <shatejas@amazon.com>
Signed-off-by: Tejas Shah <shatejas@amazon.com>
20e445d to
f80670a
Compare
|
❌ Gradle check result for f80670a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Tejas Shah <shatejas@amazon.com>
Signed-off-by: Tejas Shah <shatejas@amazon.com>
|
❌ Gradle check result for 51dccbe: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Tejas Shah <shatejas@amazon.com>
|
❌ Gradle check result for 95a175e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for ae09e32: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Tejas Shah <shatejas@amazon.com>
ae09e32 to
87f267b
Compare
|
❌ Gradle check result for 87f267b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Tejas Shah <shatejas@amazon.com>
|
❕ Gradle check result for 05d8611: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #19558 +/- ##
============================================
+ Coverage 73.25% 73.28% +0.03%
- Complexity 71979 72066 +87
============================================
Files 5796 5797 +1
Lines 329287 329463 +176
Branches 47419 47456 +37
============================================
+ Hits 241203 241459 +256
+ Misses 68759 68654 -105
- Partials 19325 19350 +25 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Tejas Shah <shatejas@amazon.com>
Bukhtawar
left a comment
There was a problem hiding this comment.
Thanks for the changes, mostly looks good. Will need a second pass, meanwhile can we add integ tests to cover query filters
The implementation aims at pruning search space based on context awareness when context-aware-grouping mapper is presents. It does a best attempt extraction of grouping criteria from the query, if no grouping criteria is found it will search all segements
Dependent on Indexing PR
Related Issues
Resolves #[19093]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
Release Notes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.