Term query I/O concurrency optimization#20415
Term query I/O concurrency optimization#20415musabafzal wants to merge 1 commit intoopensearch-project:mainfrom
Conversation
📝 WalkthroughWalkthroughModified TopDocsCollectorContext to introduce deferred shortcut total hit count calculation. Added helper methods for computing hit count differently when filter collectors or deletions exist, implemented query wrapper removal logic, and replaced direct TotalHits construction with a supplier-based approach for lazy evaluation. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ❌ 3❌ Failed checks (3 warnings)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
@server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java:
- Around line 519-532: The current shortcutTotalHitCountSupplier eagerly calls
shortcutTotalHitCount(...) during construction; change it so the term-counting
I/O runs inside the returned Supplier to truly defer I/O: inside
shortcutTotalHitCountSupplier return a Supplier that computes
shortcutTotalHitCount(reader, query) when invoked (not before), wrap the
computation with a CachedSupplier (or equivalent) to cache the result across
invocations, and convert any IOException to an UncheckedIOException so the
Supplier signature remains Supplier<TotalHits>; keep creating the TotalHits from
the computed long and preserve the deferShortcutTotalHitCount/computedHitCount
branch logic inside the supplier.
- Around line 834-851: The removeWrappersFromQuery method currently unwraps
ConstantScoreQuery, BoostQuery, and ApproximateScoreQuery but misses
IndexOrDocValuesQuery; add an else-if branch in removeWrappersFromQuery to
detect when query is an IndexOrDocValuesQuery (use the IndexOrDocValuesQuery
symbol) and set query = indexOrDocValuesQuery.getIndexQuery() so wrapped
index-based queries are unwrapped for optimization detection.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.
Applied to files:
server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (4)
server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java (4)
644-644: LGTM — supplier consumption pattern is correct.Using
totalHitsSupplier.get()here properly supports the deferred evaluation pattern once the supplier creation is fixed.
787-788: LGTM — centralizing wrapper removal improves maintainability.Calling
removeWrappersFromQueryat the start ensures the shortcut optimization works regardless of scoring wrappers likeConstantScoreQueryorBoostQuery.
826-832: LGTM — deferral eligibility logic is correct.The conditions correctly identify when
shortcutTotalHitCountwill succeed forTermQuery: no filter collector, no deletions, and aTermQueryafter unwrapping. Deferring onlyTermQuerymakes sense since its per-leafdocFreqcalls are more I/O intensive thanMatchAllDocsQuery.numDocs().
471-486: Integration logic is sound, pending the supplier fix.The flow correctly routes deferred cases through
shortcutTotalHitCountSupplier. Once the supplier is fixed to evaluate lazily (see earlier comment), this will achieve the intended I/O concurrency optimization.
| private Supplier<TotalHits> shortcutTotalHitCountSupplier( | ||
| boolean deferShortcutTotalHitCount, | ||
| int computedHitCount, | ||
| IndexReader reader, | ||
| Query query | ||
| ) throws IOException { | ||
| long shortcutTotalHitCnt; | ||
| if (deferShortcutTotalHitCount) { | ||
| shortcutTotalHitCnt = shortcutTotalHitCount(reader, query); | ||
| } else { | ||
| shortcutTotalHitCnt = computedHitCount; | ||
| } | ||
| return () -> (new TotalHits(shortcutTotalHitCnt, TotalHits.Relation.EQUAL_TO)); | ||
| } |
There was a problem hiding this comment.
Deferral does not actually defer I/O — computation happens eagerly.
The shortcutTotalHitCount(reader, query) call at line 527 executes immediately during object construction, not when the supplier is invoked. This defeats the I/O concurrency optimization since the term counting I/O still blocks the constructor.
For true deferral, move the computation inside the supplier. You'll need to handle the IOException — consider wrapping with CachedSupplier and converting to UncheckedIOException:
Proposed fix for lazy evaluation
private Supplier<TotalHits> shortcutTotalHitCountSupplier(
boolean deferShortcutTotalHitCount,
int computedHitCount,
IndexReader reader,
Query query
-) throws IOException {
- long shortcutTotalHitCnt;
+) {
if (deferShortcutTotalHitCount) {
- shortcutTotalHitCnt = shortcutTotalHitCount(reader, query);
+ return new CachedSupplier<>(() -> {
+ try {
+ return new TotalHits(shortcutTotalHitCount(reader, query), TotalHits.Relation.EQUAL_TO);
+ } catch (IOException e) {
+ throw new UncheckedIOException(e);
+ }
+ });
} else {
- shortcutTotalHitCnt = computedHitCount;
+ return () -> new TotalHits(computedHitCount, TotalHits.Relation.EQUAL_TO);
}
- return () -> (new TotalHits(shortcutTotalHitCnt, TotalHits.Relation.EQUAL_TO));
}Based on learnings, avoid evaluating a supplier upfront during object construction; resolve at use time to preserve lazy behavior.
🤖 Prompt for AI Agents
In
@server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
around lines 519 - 532, The current shortcutTotalHitCountSupplier eagerly calls
shortcutTotalHitCount(...) during construction; change it so the term-counting
I/O runs inside the returned Supplier to truly defer I/O: inside
shortcutTotalHitCountSupplier return a Supplier that computes
shortcutTotalHitCount(reader, query) when invoked (not before), wrap the
computation with a CachedSupplier (or equivalent) to cache the result across
invocations, and convert any IOException to an UncheckedIOException so the
Supplier signature remains Supplier<TotalHits>; keep creating the TotalHits from
the computed long and preserve the deferShortcutTotalHitCount/computedHitCount
branch logic inside the supplier.
| static Query removeWrappersFromQuery(Query query) { | ||
| while (true) { | ||
| // remove wrappers that don't matter for counts | ||
| // this is necessary so that we don't only optimize match_all | ||
| // queries but also match_all queries that are nested in | ||
| // a constant_score query | ||
| if (query instanceof ConstantScoreQuery constantScoreQuery) { | ||
| query = constantScoreQuery.getQuery(); | ||
| } else if (query instanceof BoostQuery boostQuery) { | ||
| query = boostQuery.getQuery(); | ||
| } else if (query instanceof ApproximateScoreQuery approximateScoreQuery) { | ||
| query = approximateScoreQuery.getOriginalQuery(); | ||
| } else { | ||
| break; | ||
| } | ||
| } | ||
| return query; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find classes that extend Query and have a getQuery() method
cd server/src/main/java || cd .
rg -t java 'class \w+.*extends.*Query' -A 20 | grep -E '(^[^:]*class|getQuery\(\)|private.*Query|public.*Query)' | head -100Repository: opensearch-project/OpenSearch
Length of output: 13497
🏁 Script executed:
# More targeted search for Query wrapper types
find . -name "*.java" -type f | xargs grep -l 'extends.*Query' | head -20Repository: opensearch-project/OpenSearch
Length of output: 1945
🏁 Script executed:
# Check what wrapper types are actually used in TopDocsCollectorContext
rg -B 5 -A 5 'instanceof.*Query' server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.javaRepository: opensearch-project/OpenSearch
Length of output: 1295
🏁 Script executed:
# Check if IndexOrDocValuesQuery is used in codebase
rg -i 'IndexOrDocValuesQuery' --type javaRepository: opensearch-project/OpenSearch
Length of output: 33093
🏁 Script executed:
# Find all usages of removeWrappersFromQuery to understand its purpose
rg 'removeWrappersFromQuery' -B 3 -A 3 server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.javaRepository: opensearch-project/OpenSearch
Length of output: 848
🏁 Script executed:
# Check what query wrapper types extend Query and have getter methods
rg -t java 'class \w+.*Query.*extends.*Query' -A 5 | grep -E '(class.*Query|getQuery\(|get.*Query\(|Original)' | head -50Repository: opensearch-project/OpenSearch
Length of output: 7940
🏁 Script executed:
# Confirm the method context and impact
rg -B 10 'shortcutTotalHitCount' server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java | head -30Repository: opensearch-project/OpenSearch
Length of output: 2188
Add handling for IndexOrDocValuesQuery wrapper type.
The removeWrappersFromQuery method should also unwrap IndexOrDocValuesQuery to properly detect optimizable queries. This wrapper type is already handled similarly in other parts of the codebase (e.g., Helper.java, CustomQueryScorer.java, NestedHelper.java). Without unwrapping it, count optimizations may not trigger for queries wrapped in IndexOrDocValuesQuery.
Add an else-if branch:
} else if (query instanceof IndexOrDocValuesQuery indexOrDocValuesQuery) {
query = indexOrDocValuesQuery.getIndexQuery();🤖 Prompt for AI Agents
In
@server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
around lines 834 - 851, The removeWrappersFromQuery method currently unwraps
ConstantScoreQuery, BoostQuery, and ApproximateScoreQuery but misses
IndexOrDocValuesQuery; add an else-if branch in removeWrappersFromQuery to
detect when query is an IndexOrDocValuesQuery (use the IndexOrDocValuesQuery
symbol) and set query = indexOrDocValuesQuery.getIndexQuery() so wrapped
index-based queries are unwrapped for optimization detection.
|
❌ Gradle check result for 7413e1d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.