Skip to content

Term query I/O concurrency optimization#20415

Closed
musabafzal wants to merge 1 commit intoopensearch-project:mainfrom
musabafzal:term-io-concurrency-optimizaton
Closed

Term query I/O concurrency optimization#20415
musabafzal wants to merge 1 commit intoopensearch-project:mainfrom
musabafzal:term-io-concurrency-optimizaton

Conversation

@musabafzal
Copy link
Copy Markdown

@musabafzal musabafzal commented Jan 13, 2026

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Refactor
    • Enhanced internal search query evaluation logic for improved performance and code maintainability.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 13, 2026

📝 Walkthrough

Walkthrough

Modified TopDocsCollectorContext to introduce deferred shortcut total hit count calculation. Added helper methods for computing hit count differently when filter collectors or deletions exist, implemented query wrapper removal logic, and replaced direct TotalHits construction with a supplier-based approach for lazy evaluation.

Changes

Cohort / File(s) Change Summary
Total Hit Count Deferment Logic
server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
Added deferShortcutTotalHitCount() static method to determine when to defer hit count calculation based on filter collectors and deletions. Added removeWrappersFromQuery() static method to strip ConstantScoreQuery, BoostQuery, and ApproximateScoreQuery wrappers. Added shortcutTotalHitCountSupplier() private method to materialize TotalHits lazily. Modified control flow to use supplier-based approach instead of direct TotalHits construction when hitCount is available.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A hop through logic, a twist and turn,
Query wrappers stripped—new patterns to learn,
Deferred and supplied, when the moment calls,
Efficient hit counts across search halls!
~Twitchy Nose, the Code Gardener

🚥 Pre-merge checks | ❌ 3
❌ Failed checks (3 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title refers to 'term query I/O concurrency optimization' but the actual changes involve deferment logic for shortcut total hit count calculation in TopDocsCollectorContext, which is unrelated to term query optimization. Update the title to accurately reflect the main change, such as 'Defer shortcut total hit count calculation when filter collector or deletions present' or similar.
Description check ⚠️ Warning The PR description is entirely a template with unfilled placeholders and unchecked checkboxes; it lacks any actual description of what the changes achieve or related issues. Fill in the Description section explaining the deferment pathway changes, resolve any related GitHub issues, and verify the testing and documentation checklist items.
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
@server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java:
- Around line 519-532: The current shortcutTotalHitCountSupplier eagerly calls
shortcutTotalHitCount(...) during construction; change it so the term-counting
I/O runs inside the returned Supplier to truly defer I/O: inside
shortcutTotalHitCountSupplier return a Supplier that computes
shortcutTotalHitCount(reader, query) when invoked (not before), wrap the
computation with a CachedSupplier (or equivalent) to cache the result across
invocations, and convert any IOException to an UncheckedIOException so the
Supplier signature remains Supplier<TotalHits>; keep creating the TotalHits from
the computed long and preserve the deferShortcutTotalHitCount/computedHitCount
branch logic inside the supplier.
- Around line 834-851: The removeWrappersFromQuery method currently unwraps
ConstantScoreQuery, BoostQuery, and ApproximateScoreQuery but misses
IndexOrDocValuesQuery; add an else-if branch in removeWrappersFromQuery to
detect when query is an IndexOrDocValuesQuery (use the IndexOrDocValuesQuery
symbol) and set query = indexOrDocValuesQuery.getIndexQuery() so wrapped
index-based queries are unwrapped for optimization detection.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba420b3 and 7413e1d.

📒 Files selected for processing (1)
  • server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.

Applied to files:

  • server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: gradle-check
🔇 Additional comments (4)
server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java (4)

644-644: LGTM — supplier consumption pattern is correct.

Using totalHitsSupplier.get() here properly supports the deferred evaluation pattern once the supplier creation is fixed.


787-788: LGTM — centralizing wrapper removal improves maintainability.

Calling removeWrappersFromQuery at the start ensures the shortcut optimization works regardless of scoring wrappers like ConstantScoreQuery or BoostQuery.


826-832: LGTM — deferral eligibility logic is correct.

The conditions correctly identify when shortcutTotalHitCount will succeed for TermQuery: no filter collector, no deletions, and a TermQuery after unwrapping. Deferring only TermQuery makes sense since its per-leaf docFreq calls are more I/O intensive than MatchAllDocsQuery.numDocs().


471-486: Integration logic is sound, pending the supplier fix.

The flow correctly routes deferred cases through shortcutTotalHitCountSupplier. Once the supplier is fixed to evaluate lazily (see earlier comment), this will achieve the intended I/O concurrency optimization.

Comment on lines +519 to +532
private Supplier<TotalHits> shortcutTotalHitCountSupplier(
boolean deferShortcutTotalHitCount,
int computedHitCount,
IndexReader reader,
Query query
) throws IOException {
long shortcutTotalHitCnt;
if (deferShortcutTotalHitCount) {
shortcutTotalHitCnt = shortcutTotalHitCount(reader, query);
} else {
shortcutTotalHitCnt = computedHitCount;
}
return () -> (new TotalHits(shortcutTotalHitCnt, TotalHits.Relation.EQUAL_TO));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Deferral does not actually defer I/O — computation happens eagerly.

The shortcutTotalHitCount(reader, query) call at line 527 executes immediately during object construction, not when the supplier is invoked. This defeats the I/O concurrency optimization since the term counting I/O still blocks the constructor.

For true deferral, move the computation inside the supplier. You'll need to handle the IOException — consider wrapping with CachedSupplier and converting to UncheckedIOException:

Proposed fix for lazy evaluation
 private Supplier<TotalHits> shortcutTotalHitCountSupplier(
     boolean deferShortcutTotalHitCount,
     int computedHitCount,
     IndexReader reader,
     Query query
-) throws IOException {
-    long shortcutTotalHitCnt;
+) {
     if (deferShortcutTotalHitCount) {
-        shortcutTotalHitCnt = shortcutTotalHitCount(reader, query);
+        return new CachedSupplier<>(() -> {
+            try {
+                return new TotalHits(shortcutTotalHitCount(reader, query), TotalHits.Relation.EQUAL_TO);
+            } catch (IOException e) {
+                throw new UncheckedIOException(e);
+            }
+        });
     } else {
-        shortcutTotalHitCnt = computedHitCount;
+        return () -> new TotalHits(computedHitCount, TotalHits.Relation.EQUAL_TO);
     }
-    return () -> (new TotalHits(shortcutTotalHitCnt, TotalHits.Relation.EQUAL_TO));
 }

Based on learnings, avoid evaluating a supplier upfront during object construction; resolve at use time to preserve lazy behavior.

🤖 Prompt for AI Agents
In
@server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
around lines 519 - 532, The current shortcutTotalHitCountSupplier eagerly calls
shortcutTotalHitCount(...) during construction; change it so the term-counting
I/O runs inside the returned Supplier to truly defer I/O: inside
shortcutTotalHitCountSupplier return a Supplier that computes
shortcutTotalHitCount(reader, query) when invoked (not before), wrap the
computation with a CachedSupplier (or equivalent) to cache the result across
invocations, and convert any IOException to an UncheckedIOException so the
Supplier signature remains Supplier<TotalHits>; keep creating the TotalHits from
the computed long and preserve the deferShortcutTotalHitCount/computedHitCount
branch logic inside the supplier.

Comment on lines +834 to +851
static Query removeWrappersFromQuery(Query query) {
while (true) {
// remove wrappers that don't matter for counts
// this is necessary so that we don't only optimize match_all
// queries but also match_all queries that are nested in
// a constant_score query
if (query instanceof ConstantScoreQuery constantScoreQuery) {
query = constantScoreQuery.getQuery();
} else if (query instanceof BoostQuery boostQuery) {
query = boostQuery.getQuery();
} else if (query instanceof ApproximateScoreQuery approximateScoreQuery) {
query = approximateScoreQuery.getOriginalQuery();
} else {
break;
}
}
return query;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find classes that extend Query and have a getQuery() method
cd server/src/main/java || cd . 
rg -t java 'class \w+.*extends.*Query' -A 20 | grep -E '(^[^:]*class|getQuery\(\)|private.*Query|public.*Query)' | head -100

Repository: opensearch-project/OpenSearch

Length of output: 13497


🏁 Script executed:

# More targeted search for Query wrapper types
find . -name "*.java" -type f | xargs grep -l 'extends.*Query' | head -20

Repository: opensearch-project/OpenSearch

Length of output: 1945


🏁 Script executed:

# Check what wrapper types are actually used in TopDocsCollectorContext
rg -B 5 -A 5 'instanceof.*Query' server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java

Repository: opensearch-project/OpenSearch

Length of output: 1295


🏁 Script executed:

# Check if IndexOrDocValuesQuery is used in codebase
rg -i 'IndexOrDocValuesQuery' --type java

Repository: opensearch-project/OpenSearch

Length of output: 33093


🏁 Script executed:

# Find all usages of removeWrappersFromQuery to understand its purpose
rg 'removeWrappersFromQuery' -B 3 -A 3 server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java

Repository: opensearch-project/OpenSearch

Length of output: 848


🏁 Script executed:

# Check what query wrapper types extend Query and have getter methods
rg -t java 'class \w+.*Query.*extends.*Query' -A 5 | grep -E '(class.*Query|getQuery\(|get.*Query\(|Original)' | head -50

Repository: opensearch-project/OpenSearch

Length of output: 7940


🏁 Script executed:

# Confirm the method context and impact
rg -B 10 'shortcutTotalHitCount' server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java | head -30

Repository: opensearch-project/OpenSearch

Length of output: 2188


Add handling for IndexOrDocValuesQuery wrapper type.

The removeWrappersFromQuery method should also unwrap IndexOrDocValuesQuery to properly detect optimizable queries. This wrapper type is already handled similarly in other parts of the codebase (e.g., Helper.java, CustomQueryScorer.java, NestedHelper.java). Without unwrapping it, count optimizations may not trigger for queries wrapped in IndexOrDocValuesQuery.

Add an else-if branch:

} else if (query instanceof IndexOrDocValuesQuery indexOrDocValuesQuery) {
    query = indexOrDocValuesQuery.getIndexQuery();
🤖 Prompt for AI Agents
In
@server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
around lines 834 - 851, The removeWrappersFromQuery method currently unwraps
ConstantScoreQuery, BoostQuery, and ApproximateScoreQuery but misses
IndexOrDocValuesQuery; add an else-if branch in removeWrappersFromQuery to
detect when query is an IndexOrDocValuesQuery (use the IndexOrDocValuesQuery
symbol) and set query = indexOrDocValuesQuery.getIndexQuery() so wrapped
index-based queries are unwrapped for optimization detection.

@musabafzal musabafzal closed this Jan 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 7413e1d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant