Skip to content

Adds search implementation of context aware segments#19558

Open
shatejas wants to merge 9 commits intoopensearch-project:mainfrom
shatejas:search-draft
Open

Adds search implementation of context aware segments#19558
shatejas wants to merge 9 commits intoopensearch-project:mainfrom
shatejas:search-draft

Conversation

@shatejas
Copy link
Copy Markdown
Contributor

@shatejas shatejas commented Oct 7, 2025

The implementation aims at pruning search space based on context awareness when context-aware-grouping mapper is presents. It does a best attempt extraction of grouping criteria from the query, if no grouping criteria is found it will search all segements

Dependent on Indexing PR

Related Issues

Resolves #[19093]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced context-aware search filtering capabilities that enable segment-level query optimizations based on context criteria.
    • Added automatic criteria extraction from queries to support context-aware filtering without manual configuration.
    • Enhanced script validation to support stored scripts in context-aware grouping configurations.
  • Tests

    • Expanded test coverage for criteria-based filtering and context-aware query extraction functionality.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions github-actions bot added discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request lucene Search Search query, autocomplete ...etc labels Oct 7, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Oct 7, 2025

❌ Gradle check result for 0ece83d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for e5db2d9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 9b887d8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

IndexShard shard = indexService.getShard(request.shardId().id());
Engine.SearcherSupplier reader = shard.acquireSearcherSupplier();
Set<String> groupingCriterias = Collections.emptySet();
if (request.source() != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this will make criteria extraction happen at every data node.

Since it's only dependent on the QueryBuilder we can compute this once at the Coordinator and pass to every transport search task.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@expani Thats right it happens on every data node. Can you point me to the right code pointer so I can explore this to compute on coordinator node?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseSource also executes on data node.

I looked around and there doesn't seem to be a good place to execute on coordinator - From what I understand, iterates through the shards and executes the search on each shard, there are no operations on query builder that happen on co-ordinator in current state

*
* @opensearch.internal
*/
public interface WithFilterQueryBuilder {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being instantiated anywhere, Can you elaborate on when this will be used ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are plugin queries, like knn query, which can have filters, If a plugin query implements this interface - it will be able to leverage pruning in context aware cases

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added documentation around this

Copy link
Copy Markdown
Contributor

@Bukhtawar Bukhtawar Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is a little misleading, consider CriteriaAwareFilterQueryBuilder/FilterAwareQueryBuilder. Also wondering should it also bind with a filter criteria or you want to keep this generic if this finds common use

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wondering should it also bind with a filter criteria or you want to keep this generic if this finds common use

The intent was to keep it generic, but let me know if you wouldn't want it generic

The name is a little misleading, consider CriteriaAwareFilterQueryBuilder/FilterAwareQueryBuilder

Will go with FilterAwareQueryBuilder

* {@link org.opensearch.indices.IndicesRequestCache} depends on cache key generated in OpenSearchDirectoryReader
* This cache is used to be able to keep the cacheKey of OpensearchDirectoryReader consistent in cases of multiple criteria
*/
private Cache<Set<String>, OpenSearchDirectoryReader> criteriaCache;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the extractCriteria implementation, the key Set<String> will be ordered based on how they appear in the query if there are multiple criterias ( BooleanQuery )

I think we need to preserve the order for the cache lookup to work if query is same but order of criteria fields are different.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I will sort the criterias in extractCriteria

// Do not do this warm up if its CriteriaBasedReader as criteriaBasedReaders are already formed
if (!(in instanceof CriteriaBasedReader)) {
warmUpCriteriaBasedReader(in.leaves());
criteriaCache = CacheBuilder.<Set<String>, OpenSearchDirectoryReader>builder().setMaximumWeight(1000).build();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Cluster setting index.context_aware.search.criteria_cache is not used here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will add it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked around an now I remember that there is no easy way of being able to extract IndexSettings in reader and thats the reason for hard coding the value. I will find a good way to pass the value

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for b6eb84e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 74ac3dc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for ab40f31: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 71b42b4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 2b4cb4c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

public SearcherSupplier acquireSearcherSupplier(
Function<Searcher, Searcher> wrapper,
SearcherScope scope,
@Nullable Set<String> criteria
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use Optional instead?


private OpenSearchDirectoryReader createChildDirectoryReader(final List<LeafReader> leafReaders, Set<String> criteria)
throws IOException {
final String uniqCacheKey = UUID.nameUUIDFromBytes(String.join(",", criteria).getBytes(StandardCharsets.UTF_8)).toString();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we have multiple criterion can the set have out of order but same criteria causing multiple keys(unless you are using a SortedSet)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorting it before building a cachekey

@shatejas shatejas marked this pull request as ready for review December 7, 2025 07:14
@shatejas shatejas force-pushed the search-draft branch 2 times, most recently from 3612534 to cf2c41b Compare January 24, 2026 03:19
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for cf2c41b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for bf66db7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 20e445d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

The implementation aims at pruning search space based on context
awareness when context-aware-grouping mapper is presents. It does a best
attempt extraction of grouping criteria from the query, if no grouping
criteria is found it will search all segements

Signed-off-by: Tejas Shah <shatejas@amazon.com>
Signed-off-by: Tejas Shah <shatejas@amazon.com>
Signed-off-by: Tejas Shah <shatejas@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for f80670a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Tejas Shah <shatejas@amazon.com>
Signed-off-by: Tejas Shah <shatejas@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 51dccbe: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Tejas Shah <shatejas@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 95a175e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for ae09e32: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Tejas Shah <shatejas@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 87f267b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Tejas Shah <shatejas@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for 05d8611: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 86.20690% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.28%. Comparing base (672039d) to head (e82547b).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...common/lucene/index/OpenSearchDirectoryReader.java 84.09% 6 Missing and 1 partial ⚠️
...textaware/ContextAwareCriteriaQueryExtraction.java 92.40% 2 Missing and 4 partials ⚠️
.../main/java/org/opensearch/index/engine/Engine.java 50.00% 2 Missing and 2 partials ⚠️
...main/java/org/opensearch/search/SearchService.java 66.66% 1 Missing and 1 partial ⚠️
.../index/mapper/ContextAwareGroupingFieldMapper.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #19558      +/-   ##
============================================
+ Coverage     73.25%   73.28%   +0.03%     
- Complexity    71979    72066      +87     
============================================
  Files          5796     5797       +1     
  Lines        329287   329463     +176     
  Branches      47419    47456      +37     
============================================
+ Hits         241203   241459     +256     
+ Misses        68759    68654     -105     
- Partials      19325    19350      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Tejas Shah <shatejas@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for e82547b: SUCCESS

Copy link
Copy Markdown
Contributor

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, mostly looks good. Will need a second pass, meanwhile can we add integ tests to cover query filters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request lucene Search Search query, autocomplete ...etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Search for context aware segments

3 participants