Skip to content

Added ability to prefetch docs in fetch phase#20176

Open
sruti1312 wants to merge 1 commit intoopensearch-project:mainfrom
sruti1312:feature/fetch
Open

Added ability to prefetch docs in fetch phase#20176
sruti1312 wants to merge 1 commit intoopensearch-project:mainfrom
sruti1312:feature/fetch

Conversation

@sruti1312
Copy link
Copy Markdown
Contributor

@sruti1312 sruti1312 commented Dec 6, 2025

Description

Adding ability to prefetch docs in fetch phase

Related Issues

Resolves #18780

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

Performance result for

Size With Prefetch P50 (ms) Without Prefetch P50 (ms) With Prefetch P90 (ms) Without Prefetch P90 (ms) With Prefetch P100 (ms) Without Prefetch P100 (ms)
Default 450.869 482.155 502.351 493.867 513.384 499.374
100 505.812 652.855 530.202 727.217 563.502 743.535
1000 1914.27 4127.89 1966.09 4298.37 1972.7 4676.91

Fetch Phase Cost (Arthas) for size as 10:

Configuration P50 (ms) P90 (ms) P99 (ms) P100 (ms)
With Prefetch 14.703235 22.755569 94.606265 106.321523
Without Prefetch 29.532539 36.979982 98.615315 112.293080

For different queries,

Query With Prefetch P50 (ms) Without Prefetch P50 (ms) With Prefetch P90 (ms) Without Prefetch P90 (ms) With Prefetch P100 (ms) Without Prefetch P100 (ms)
term 468.926 474.929 476.166 525.064 511.542 677.766
asc_sort_timestamp 590.812 603.017 612.373 659.616 623.102 717.948
desc_sort_timestamp 578.757 643.036 592.766 701.633 605.739 747.248

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • New Features

    • Added an experimental, dynamic index-level setting (enabled by default) to control document prefetching during the fetch phase; can be toggled at runtime per index.
  • Tests

    • Added unit tests covering default behavior, explicit configuration, dynamic updates, prefetch thresholds, sequencing, nested-doc handling, loop bounds, and reader/doc calculations.
  • Documentation

    • CHANGELOG updated to note the new fetch-phase prefetch capability.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions github-actions bot added discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request lucene Search Search query, autocomplete ...etc labels Dec 6, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 6, 2025

📝 Walkthrough

Walkthrough

Adds an experimental dynamic index setting to control prefetching of stored fields during the fetch phase, exposes the flag via SearchContext/DefaultSearchContext, integrates a prefetch loop into FetchPhase that interacts with leaf readers, registers the setting as built-in, and adds unit tests plus a changelog entry.

Changes

Cohort / File(s) Summary
Settings
server/src/main/java/org/opensearch/index/IndexSettings.java, server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
Add PREFETCH_DOCS_DURING_FETCH_ENABLED (index-scoped, dynamic, @ExperimentalApi, default=true); cache + getter + dynamic update consumer in IndexSettings; register setting in BUILT_IN_INDEX_SETTINGS.
Search Contexts
server/src/main/java/org/opensearch/search/internal/SearchContext.java, server/src/main/java/org/opensearch/search/DefaultSearchContext.java
Add isPrefetchDocsEnabled() to SearchContext (default true, ExperimentalApi); DefaultSearchContext caches index-level value and exposes evaluatePrefetchDocsEnabled().
Fetch Phase
server/src/main/java/org/opensearch/search/fetch/FetchPhase.java
Integrate prefetch orchestration before per-doc fetch: compute reader indices/relative doc IDs, determine prefetch count/threshold, skip nested docs, choose sequential vs segment stored-fields reader paths and invoke prefetch for first N docs; preserve existing per-doc processing and warning-on-error behavior.
Tests
server/src/test/java/org/opensearch/index/IndexSettingsTests.java, server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java, server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
Add unit tests for default/explicit/dynamic setting behavior and multiple fetch-prefetch scenarios (thresholds, sequential detection, nested-doc skipping, reader index changes, loop bounds); minor formatting cleanup in an existing test.
Changelog
CHANGELOG.md
Add Unreleased entry: "Add ability to prefetch docs in fetch phase".

Sequence Diagram(s)

sequenceDiagram
    participant IS as IndexSettings
    participant SC as SearchContext
    participant FP as FetchPhase
    participant LR as LeafReader

    IS->>SC: publish PREFETCH_DOCS_DURING_FETCH_ENABLED (dynamic)
    SC->>SC: cache prefetch flag

    FP->>SC: isPrefetchDocsEnabled()?
    SC-->>FP: true / false

    alt Prefetch enabled
        FP->>FP: compute readerIndex, docBases, numberToPrefetch
        FP->>LR: prefetch(docIdList) (sequential or segment path)
        LR-->>FP: success / error
        FP->>FP: log & continue on error
    else Prefetch disabled
        FP->>FP: skip prefetch
    end

    FP->>FP: proceed with per-document stored-field fetch loop
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

Search:Performance

Suggested reviewers

  • msfroh
  • andrross
  • dbwiddis
  • cwperks
  • sachinpkale
  • shwetathareja
  • mch2
  • reta
  • jed326
  • anasalkouz

Poem

🐇 I hop to the index, nibble bytes ahead,
I scent the docs where fields are shed.
I skip the nests and fetch the rest,
I warm the reads to do my best.
🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.34% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding prefetch capability to the fetch phase, which aligns with the primary objective and all file changes.
Description check ✅ Passed The description adequately addresses the template requirements with a clear description of changes, linked issue reference, completed checklist items, and performance metrics demonstrating the feature's impact.
Linked Issues check ✅ Passed The PR implements the core objective from #18780 by adding prefetch capability to the fetch phase with configurable settings and performance improvements, though posix fadvise optimization and concurrent prefetch strategies remain unimplemented.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing document prefetch in the fetch phase: new index settings, fetch phase logic, search context integration, and comprehensive tests. A single trailing whitespace removal in FetchPhaseTests is negligible formatting.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0449ce8 and e4fc4d6.

📒 Files selected for processing (5)
  • server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (4 hunks)
  • server/src/main/java/org/opensearch/search/DefaultSearchContext.java (3 hunks)
  • server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (2 hunks)
  • server/src/main/java/org/opensearch/search/internal/SearchContext.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (39)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: Analyze (java)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: Analyze (java)
🔇 Additional comments (4)
server/src/main/java/org/opensearch/search/internal/SearchContext.java (1)

597-604: New prefetch hook on SearchContext is well-scoped and backwards compatible

Providing a concrete, default-false isPrefetchDocsEnabled() on the abstract base cleanly adds the extension point without breaking existing subclasses; the Javadoc makes the contract clear. No changes needed here.

server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java (1)

296-303: Wiring prefetch setting into BUILT_IN_INDEX_SETTINGS looks correct

Adding IndexSettings.PREFETCH_DOCS_DURING_FETCH_ENABLED to BUILT_IN_INDEX_SETTINGS properly exposes the index-level toggle through the standard settings machinery and matches the surrounding pattern.

server/src/main/java/org/opensearch/search/DefaultSearchContext.java (1)

225-226: Prefetch flag plumbed from IndexSettings into DefaultSearchContext correctly

The dedicated prefetchDocsForFetchPhaseEnabled field, initialized from indexService.getIndexSettings().isPrefetchDocsEnabled() and surfaced via isPrefetchDocsEnabled(), cleanly ties the index setting into the fetch phase. Given that search contexts are short-lived per request, snapshotting the value in the constructor is reasonable and keeps the override trivial.

Also applies to: 295-296, 772-775

server/src/main/java/org/opensearch/index/IndexSettings.java (1)

869-874: Index-level prefetch setting is defined and wired consistently with other dynamic settings

The new PREFETCH_DOCS_DURING_FETCH_ENABLED setting, cached prefetchDocsEnabled field, constructor initialization, and isPrefetchDocsEnabled/setPrefetchDocsEnabled pair all follow the established pattern for dynamic index settings (volatile cache + IndexScopedSettings update consumer). Defaulting to true with IndexScope and Dynamic is appropriate for a tunable performance feature.

Also applies to: 1024-1028, 1196-1202, 2298-2304

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 6, 2025

❌ Gradle check result for e4fc4d6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

LeafReader leafReader = FilterLeafReader.unwrap(filterLeafReader);
if (leafReader instanceof SegmentReader segmentReader) {
segmentReader.getFieldsReader().prefetch(relativeDocId);
}
Copy link
Copy Markdown

@abiesps abiesps Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some upper bound on how many prefetch calls we make.
We could also batch documents by segments and call prefetch only for one docId per page size for each segment. Let me know your thoughts on this ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also wondering, if there is a way to know this 'prefetching distance' (When to call prefetch before actual IO) and what is its relationship with IO latency (given there is enough memory and compute). Like in this case, you were able to see around 50% improvement in fetch phase execution time by iterating over docId twice.
Do we know why the time hasn't improved further, if there is still IO during the loop of actually fetching doc Ids.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the granularity of prefetched blocks cache is at clone level of IndexInput (https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java#L624) . Can we validate if this translate to per request scope or per thread.
If its per request, there is a scope of removing redundant prefetch call by having another cache at higher granular level (than request scope).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the granularity of prefetched blocks cache is at clone level of IndexInput (https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java#L624) . Can we validate if this translate to per request scope or per thread.

Nevermind its at segment reader level.

@bowenlan-amzn
Copy link
Copy Markdown
Member

Seems to me the scenario is mainly for a cold start on a slow storage, where page cache is empty, using prefetch API could warm up quicker because of parallel IO.

I see the numbers in the description which is nice! Could you also note down the machine, workload and probably some steps to clear page cache before benchmark.

Some metrics could also help to show the impact of this change, and useful for operation readiness.

iostat could show

  • io wait should decrease
  • read wait should decrease
  • read requests merged should increase (prefetch may batch?)

page cache

  • cache hits and misses should be higher and lowere for prefetch respectively

int prefetchReaderIndex = -1;
LeafReaderContext prefetchReaderContext = null;
boolean hasSequentialDocs = hasSequentialDocs(docs);
boolean useSequentialPrefetch = hasSequentialDocs && docs.length >= 10;
Copy link
Copy Markdown
Member

@bowenlan-amzn bowenlan-amzn Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought here on docs.length, when it's very large, probably we also shouldn't do this prefetch, or do this in a batched manner.
I'm little worried early prefetched pages may be evicted from page cache before they are read.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
CHANGELOG.md (1)

15-15: Align changelog entry verb form with existing entries.

The entry uses "Adding ability" while most other entries in the changelog use either the imperative form ("Add") or past tense ("Added"). For consistency, consider changing to "Add support for prefetching docs in fetch phase" or "Added support for prefetching docs in fetch phase".

✍️ Suggested style alignment
-- Adding ability to prefetch docs in fetch phase ([#20176](https://github.com/opensearch-project/OpenSearch/pull/20176))
+- Add support for prefetching docs in fetch phase ([#20176](https://github.com/opensearch-project/OpenSearch/pull/20176))
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79fe841 and 4cd56d4.

📒 Files selected for processing (1)
  • CHANGELOG.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T19:23:29.698Z
Learnt from: karenyrx
Repo: opensearch-project/OpenSearch PR: 20335
File: modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/search/SearchSourceBuilderProtoUtils.java:155-172
Timestamp: 2026-01-02T19:23:29.698Z
Learning: The gRPC search API in OpenSearch is marked as "experimental" in official documentation, so changes to proto schemas that remove previously unsupported fields (those throwing UnsupportedOperationException) are not considered breaking changes.

Applied to files:

  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: Analyze (java)
  • GitHub Check: Mend Security Check

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 4cd56d4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@sruti1312 sruti1312 changed the title Adding ability to prefetch docs in fetch phase Added ability to prefetch docs in fetch phase Jan 12, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @CHANGELOG.md:
- Line 46: The changelog entry "Adding ability to prefetch docs in fetch phase
([#20176](https://github.com/opensearch-project/OpenSearch/pull/20176))" is
missing the required list marker; update the line in the "Added" section by
prefixing it with "- " so it matches other entries (i.e., make the entry start
with a dash and a space).
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (1)

245-247: Redundant docs.length >= 10 check.

The docs.length >= 10 condition is already incorporated into hasSequentialDocs at line 180, making the explicit check at line 247 redundant.

♻️ Suggested simplification
-                    if (currentReaderContext.reader() instanceof SequentialStoredFieldsLeafReader lf
-                        && hasSequentialDocs
-                        && docs.length >= 10) {
+                    if (currentReaderContext.reader() instanceof SequentialStoredFieldsLeafReader lf
+                        && hasSequentialDocs) {
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4cd56d4 and aede52b.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
  • server/src/main/java/org/opensearch/index/IndexSettings.java
  • server/src/main/java/org/opensearch/search/DefaultSearchContext.java
  • server/src/main/java/org/opensearch/search/fetch/FetchPhase.java
  • server/src/main/java/org/opensearch/search/internal/SearchContext.java
  • server/src/test/java/org/opensearch/index/IndexSettingsTests.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🚧 Files skipped from review as they are similar to previous changes (4)
  • server/src/main/java/org/opensearch/search/DefaultSearchContext.java
  • server/src/main/java/org/opensearch/search/internal/SearchContext.java
  • server/src/test/java/org/opensearch/index/IndexSettingsTests.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T19:23:29.698Z
Learnt from: karenyrx
Repo: opensearch-project/OpenSearch PR: 20335
File: modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/search/SearchSourceBuilderProtoUtils.java:155-172
Timestamp: 2026-01-02T19:23:29.698Z
Learning: The gRPC search API in OpenSearch is marked as "experimental" in official documentation, so changes to proto schemas that remove previously unsupported fields (those throwing UnsupportedOperationException) are not considered breaking changes.

Applied to files:

  • server/src/main/java/org/opensearch/index/IndexSettings.java
  • CHANGELOG.md
🪛 LanguageTool
CHANGELOG.md

[grammar] ~19-~19: Use a hyphen to join words.
Context: ...tRangeQuery`. - Add a mapper for context aware segments grouping criteria ([#1923...

(QB_NEW_EN_HYPHEN)


[uncategorized] ~21-~21: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...568)) - Add support for repository with Server side encryption enabled and client side encr...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~21-~21: Use a hyphen to join words.
Context: ...erver side encryption enabled and client side encryption as well based on a flag....

(QB_NEW_EN_HYPHEN)


[uncategorized] ~38-~38: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...enSearch/pull/19878)) - Add support for context aware segments ([#19098](https://github.com/o...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: Analyze (java)
  • GitHub Check: detect-breaking-change
  • GitHub Check: Mend Security Check
🔇 Additional comments (7)
server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java (1)

301-301: LGTM!

The new PREFETCH_DOCS_DURING_FETCH_ENABLED setting is correctly registered in BUILT_IN_INDEX_SETTINGS, enabling it to be recognized as a valid index-scoped setting.

server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (2)

38-42: LGTM!

The new Lucene imports are required for the prefetch logic to unwrap readers and access the StoredFieldsReader.


180-229: Prefetch logic implementation looks sound.

The prefetch implementation correctly:

  • Skips scroll queries to avoid interfering with cached sequential readers
  • Handles cancellation checks
  • Skips nested documents which require additional source filtering
  • Prefetches only the first doc per segment for sequential access patterns
  • Uses defensive error handling to avoid breaking the fetch phase

A few observations:

  1. The currentReaderIndex != readerIndex check at line 207 is intentionally performed before updating currentReaderIndex at line 220, ensuring only the first document per segment triggers prefetch for sequential access - this is correct.

  2. The exception handling at lines 221-223 is appropriately defensive for a performance optimization.

server/src/main/java/org/opensearch/index/IndexSettings.java (3)

890-896: LGTM!

The new experimental setting is well-defined:

  • @ExperimentalApi annotation appropriately marks this as experimental
  • Default true enables the optimization by default for improved cold-start latency
  • Dynamic property allows runtime configuration changes
  • Setting key index.prefetch_docs.fetch_phase.enabled follows naming conventions

1047-1050: LGTM!

The field is correctly declared as volatile for thread-safe access with dynamic setting updates. The initialization and update consumer registration follow the established pattern used by other dynamic settings in this class.

Also applies to: 1224-1225


2328-2335: LGTM!

The getter/setter pair follows the established pattern for dynamic settings. The public getter provides access to search context, while the private setter is correctly used only by the settings update consumer.

server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java (1)

17-178: Good unit test coverage for prefetch logic components.

The tests effectively validate the individual decision points in the prefetch implementation:

  • Threshold calculations with various doc array sizes
  • Enablement conditions including scroll query handling
  • Sequential docs detection via FetchPhase.hasSequentialDocs()
  • Loop bounds verification
  • Nested document skip logic
  • Reader index change detection
  • Relative doc ID calculations

The workaround for avoiding mocking ScrollContext (lines 49-55) is reasonable given it's a final class.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @server/src/main/java/org/opensearch/search/fetch/FetchPhase.java:
- Around line 205-219: The else-branch in FetchPhase incorrectly prefetches for
SequentialStoredFieldsLeafReader when currentReaderIndex == readerIndex,
defeating the "only prefetch the first doc per segment" optimization; update the
guard so the else block is skipped for sequential docs in the same segment by
adding a condition preventing execution when currentReaderContext.reader() is a
SequentialStoredFieldsLeafReader and hasSequentialDocs and currentReaderIndex ==
readerIndex (or equivalently extend the original if to cover this case),
ensuring only the first doc per segment triggers prefetch.
🧹 Nitpick comments (4)
server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java (1)

173-183: Tests are tautological and don't verify actual behavior.

These tests mock isPrefetchDocsEnabled() to return a specific value and then assert that same value is returned. This only verifies that Mockito stubbing works, not the actual feature behavior.

To meaningfully test the prefetch setting:

  1. Test SearchContext's default implementation returns true without mocking
  2. Test DefaultSearchContext with real IndexSettings to verify it respects the index-level setting

Consider refactoring to test actual behavior rather than mock stubbing.

♻️ Suggested approach for meaningful tests
 public void testPrefetchDocsEnabledByDefault() {
-    SearchContext context = mock(SearchContext.class);
-    when(context.isPrefetchDocsEnabled()).thenReturn(true);
-    assertTrue(context.isPrefetchDocsEnabled());
+    // Test the actual default implementation in SearchContext base class
+    // This would require a concrete test implementation or testing DefaultSearchContext
+    // with properly configured IndexSettings where the setting defaults to true
 }

Alternatively, test using a spy on a real instance or use TestSearchContext configured with appropriate settings.

server/src/main/java/org/opensearch/search/DefaultSearchContext.java (1)

1300-1305: Fallback value differs from base class default.

When indexService or indexSettings is null, this returns false, but the base SearchContext.isPrefetchDocsEnabled() returns true. While this defensive null-check follows the pattern of similar methods (e.g., evaluateKeywordIndexOrDocValuesEnabled), the inconsistency could cause unexpected behavior if the fallback is ever triggered.

Consider whether returning true (matching the base class default) would be more appropriate for consistency, or document why false is the correct fallback here.

server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (2)

180-186: Consider making prefetchDocsThreshold configurable.

The threshold is hardcoded to 1000. Since the prefetch feature itself is controlled by an index-level setting (PREFETCH_DOCS_DURING_FETCH_ENABLED), consider exposing the threshold as a configurable setting as well to allow tuning based on storage characteristics and workload patterns.

Additionally, docs.length >= 10 is checked here in hasSequentialDocs and again at line 247 in the main loop—the latter check is redundant.


245-247: Redundant check: docs.length >= 10 is already included in hasSequentialDocs.

At line 180, hasSequentialDocs is defined as hasSequentialDocs(docs) && docs.length >= 10. The additional && docs.length >= 10 check at line 247 is therefore redundant and can be removed.

Suggested simplification
                 if (currentReaderContext.reader() instanceof SequentialStoredFieldsLeafReader lf
-                    && hasSequentialDocs
-                    && docs.length >= 10) {
+                    && hasSequentialDocs) {
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aede52b and a59789f.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
  • server/src/main/java/org/opensearch/index/IndexSettings.java
  • server/src/main/java/org/opensearch/search/DefaultSearchContext.java
  • server/src/main/java/org/opensearch/search/fetch/FetchPhase.java
  • server/src/main/java/org/opensearch/search/internal/SearchContext.java
  • server/src/test/java/org/opensearch/index/IndexSettingsTests.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🚧 Files skipped from review as they are similar to previous changes (3)
  • server/src/test/java/org/opensearch/index/IndexSettingsTests.java
  • server/src/main/java/org/opensearch/index/IndexSettings.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T19:23:29.698Z
Learnt from: karenyrx
Repo: opensearch-project/OpenSearch PR: 20335
File: modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/search/SearchSourceBuilderProtoUtils.java:155-172
Timestamp: 2026-01-02T19:23:29.698Z
Learning: The gRPC search API in OpenSearch is marked as "experimental" in official documentation, so changes to proto schemas that remove previously unsupported fields (those throwing UnsupportedOperationException) are not considered breaking changes.

Applied to files:

  • CHANGELOG.md
  • server/src/main/java/org/opensearch/search/internal/SearchContext.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: gradle-check
  • GitHub Check: detect-breaking-change
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: Mend Security Check
🔇 Additional comments (7)
CHANGELOG.md (1)

15-15: LGTM!

The changelog entry is properly formatted and correctly placed in the "Added" section for the unreleased 3.x version.

server/src/main/java/org/opensearch/search/internal/SearchContext.java (1)

602-610: LGTM!

The new experimental API method follows the established pattern in SearchContext for optional override methods (similar to includeNamedQueriesScore() and shouldUseConcurrentSearch()). The default true value aligns with the feature being enabled by default, and the @ExperimentalApi annotation is appropriate for a new feature.

server/src/main/java/org/opensearch/search/DefaultSearchContext.java (2)

229-229: LGTM!

The field declaration and initialization follow the established pattern used by other cached settings in this class (e.g., keywordIndexOrDocValuesEnabled, maxAggRewriteFilters).

Also applies to: 300-300


777-780: LGTM!

Clean override that returns the cached value, consistent with the class's pattern for similar settings.

server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java (1)

301-301: LGTM!

The new setting is correctly registered in BUILT_IN_INDEX_SETTINGS, making it available as a recognized index-level setting.

server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (2)

38-42: LGTM!

The new Lucene imports are necessary for the prefetch logic to unwrap reader wrappers and access the underlying SegmentReader for stored field prefetching.


228-230: LGTM!

Correctly resetting currentReaderIndex and currentReaderContext before the main processing loop ensures the prefetch loop's state doesn't interfere with the actual document fetching logic.

@sruti1312 sruti1312 force-pushed the feature/fetch branch 2 times, most recently from e38ec54 to 939a973 Compare January 12, 2026 23:52
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @server/src/main/java/org/opensearch/search/fetch/FetchPhase.java:
- Around line 206-218: The code incorrectly prefetches when currentReaderIndex
== readerIndex and hasSequentialDocs is true; change the branching so that when
we're in the same segment (currentReaderIndex == readerIndex) AND
hasSequentialDocs is true we skip prefetch entirely. Concretely, update the
condition around SequentialStoredFieldsLeafReader/FilterLeafReader so it only
attempts prefetch when either the reader index differs (currentReaderIndex !=
readerIndex) OR there are NOT sequential docs (hasSequentialDocs == false);
leave the existing calls to
SequentialStoredFieldsLeafReader.getSequentialStoredFieldsReader().prefetch(relativeDocId)
and SegmentReader.getFieldsReader().prefetch(relativeDocId) unchanged.
🧹 Nitpick comments (5)
server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (2)

180-186: Consider extracting magic numbers as named constants.

The thresholds 10 (minimum docs for sequential optimization) and 1000 (prefetch threshold) are hardcoded. Extracting them as named constants improves readability and maintainability.

Suggested refactor
+    private static final int SEQUENTIAL_DOCS_MIN_COUNT = 10;
+    private static final int PREFETCH_DOCS_THRESHOLD = 1000;
+
     public void execute(SearchContext context, String profileDescription) {
         // ...
-        boolean hasSequentialDocs = hasSequentialDocs(docs) && docs.length >= 10;
+        boolean hasSequentialDocs = hasSequentialDocs(docs) && docs.length >= SEQUENTIAL_DOCS_MIN_COUNT;
 
-        int prefetchDocsThreshold = 1000;
-        int numberOfDocsToPrefetch = Math.min(docs.length, prefetchDocsThreshold);
+        int numberOfDocsToPrefetch = Math.min(docs.length, PREFETCH_DOCS_THRESHOLD);

221-223: Catching generic Exception may mask programming errors.

While swallowing prefetch failures is acceptable since it's an optimization, catching all Exception types could hide bugs like NullPointerException or IllegalStateException. Consider catching more specific exceptions like IOException.

Suggested refactor
-                    } catch (Exception e) {
+                    } catch (IOException e) {
                         LOGGER.warn("Failed to prefetch the doc, will continue to prefetch the other docs", e);
                     }
server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java (3)

19-37: Tests duplicate Java's Math.min behavior rather than FetchPhase logic.

This test validates Math.min rather than any FetchPhase-specific code. Consider either removing it or replacing it with a test that exercises the actual prefetch threshold logic within FetchPhase.execute().


101-128: Test verifies for-loop behavior rather than FetchPhase implementation.

This test counts iterations in a standalone loop, which doesn't exercise FetchPhase code. Consider replacing with a test that verifies the prefetch behavior through FetchPhase or marks for removal.


17-178: Consider adding integration tests for prefetch behavior.

The current tests validate individual arithmetic/comparison operations but don't exercise the actual prefetch logic with mocked readers. Integration tests covering scenarios like:

  • Prefetch called on first doc of each segment with sequential docs
  • Prefetch not called for nested documents
  • Exception handling during prefetch doesn't break fetch phase
  • Behavior when SequentialStoredFieldsLeafReader vs SegmentReader is used

would provide more meaningful coverage.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a59789f and e38ec54.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
  • server/src/main/java/org/opensearch/index/IndexSettings.java
  • server/src/main/java/org/opensearch/search/DefaultSearchContext.java
  • server/src/main/java/org/opensearch/search/fetch/FetchPhase.java
  • server/src/main/java/org/opensearch/search/internal/SearchContext.java
  • server/src/test/java/org/opensearch/index/IndexSettingsTests.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
💤 Files with no reviewable changes (1)
  • server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/src/main/java/org/opensearch/search/DefaultSearchContext.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T19:23:29.698Z
Learnt from: karenyrx
Repo: opensearch-project/OpenSearch PR: 20335
File: modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/search/SearchSourceBuilderProtoUtils.java:155-172
Timestamp: 2026-01-02T19:23:29.698Z
Learning: The gRPC search API in OpenSearch is marked as "experimental" in official documentation, so changes to proto schemas that remove previously unsupported fields (those throwing UnsupportedOperationException) are not considered breaking changes.

Applied to files:

  • server/src/main/java/org/opensearch/index/IndexSettings.java
  • server/src/main/java/org/opensearch/search/internal/SearchContext.java
  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: gradle-check
  • GitHub Check: Mend Security Check
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: Analyze (java)
🔇 Additional comments (14)
CHANGELOG.md (1)

15-15: LGTM!

The changelog entry is correctly formatted, follows the existing pattern, and is appropriately placed under the "Added" section for unreleased 3.x changes.

server/src/main/java/org/opensearch/index/IndexSettings.java (5)

39-39: LGTM!

Import correctly added for the @ExperimentalApi annotation used on the new setting.


890-896: LGTM!

The experimental setting is well-defined:

  • Appropriate @ExperimentalApi annotation for the new feature.
  • Default true enables the optimization by default as intended.
  • Dynamic property allows operators to toggle at runtime without index recreation.

1047-1050: LGTM!

The field declaration is correct:

  • volatile modifier ensures visibility across threads for dynamic updates.
  • Follows the established pattern for dynamic index settings in this class.

1224-1225: LGTM!

Initialization and dynamic update registration follow the established pattern for index settings.


2328-2335: LGTM!

Getter and setter implementation is correct:

  • Public getter for external access.
  • Private setter restricts modification to the settings update consumer.
server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java (1)

301-301: LGTM!

The setting is correctly registered in BUILT_IN_INDEX_SETTINGS, enabling it to be recognized and validated by the index settings infrastructure.

server/src/test/java/org/opensearch/index/IndexSettingsTests.java (3)

1176-1181: LGTM!

Test correctly validates the default value of true for the prefetch docs setting.


1183-1190: LGTM!

Test correctly validates that explicit configuration of false overrides the default.


1192-1207: LGTM!

Test comprehensively validates dynamic update behavior:

  • Verifies default value.
  • Tests disabling via dynamic update.
  • Tests re-enabling via dynamic update.

This follows the established pattern for dynamic setting tests in this file.

server/src/main/java/org/opensearch/search/internal/SearchContext.java (1)

602-610: LGTM!

The base implementation with default true is appropriate:

  • Aligns with the setting's default value.
  • @ExperimentalApi annotation matches the setting definition.
  • DefaultSearchContext correctly overrides this method to return prefetchDocsForFetchPhaseEnabled, which respects the index-level setting.
  • The method is properly used in FetchPhase to control prefetching behavior during the fetch phase.
server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (2)

228-229: LGTM!

Correctly resetting reader state before the main processing loop ensures the prefetch loop's state doesn't leak into the fetch logic.


245-246: LGTM!

The condition now consistently uses the pre-computed hasSequentialDocs flag, aligning with the prefetch logic.

server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java (1)

64-99: Good coverage of hasSequentialDocs method.

This test properly exercises the public FetchPhase.hasSequentialDocs() method with various inputs including the combined condition with the length threshold.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @server/src/main/java/org/opensearch/search/fetch/FetchPhase.java:
- Around line 212-219: The non-sequential prefetch branch in FetchPhase
currently only handles a FilterLeafReader and misses the case where the reader
is already a SegmentReader; update the logic to defensively obtain a
SegmentReader (either by unwrapping a FilterLeafReader via
FilterLeafReader.unwrap(filterLeafReader) or by using the
Lucene.segmentReader(currentReaderContext.reader()) helper) and, if non-null,
call segmentReader.getFieldsReader().prefetch(relativeDocId) so both wrapped and
unwrapped SegmentReader instances get the prefetch call.
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (1)

182-186: Consider making the prefetch threshold configurable in the future.

The hardcoded threshold of 1000 is reasonable for most use cases, but consider extracting this to a configurable setting for tuning in production environments with different workload characteristics.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e38ec54 and 939a973.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
  • server/src/main/java/org/opensearch/index/IndexSettings.java
  • server/src/main/java/org/opensearch/search/DefaultSearchContext.java
  • server/src/main/java/org/opensearch/search/fetch/FetchPhase.java
  • server/src/main/java/org/opensearch/search/internal/SearchContext.java
  • server/src/test/java/org/opensearch/index/IndexSettingsTests.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
💤 Files with no reviewable changes (1)
  • server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🚧 Files skipped from review as they are similar to previous changes (6)
  • server/src/main/java/org/opensearch/index/IndexSettings.java
  • CHANGELOG.md
  • server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java
  • server/src/main/java/org/opensearch/search/DefaultSearchContext.java
  • server/src/main/java/org/opensearch/search/internal/SearchContext.java
  • server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: Analyze (java)
🔇 Additional comments (4)
server/src/test/java/org/opensearch/index/IndexSettingsTests.java (1)

1177-1207: LGTM!

The tests for the prefetch docs setting are well-structured and follow established patterns in this test file. Coverage includes:

  • Default value verification
  • Explicit value override
  • Dynamic update behavior (both enable and disable paths)
server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (3)

180-180: LGTM!

Computing hasSequentialDocs once at the top level and reusing it for both the prefetch loop and the main processing loop is a good optimization. The threshold of 10 documents is reasonable to avoid prefetch overhead for small result sets.


222-224: Exception handling is appropriately defensive for an optimization.

Catching broad Exception and logging a warning ensures prefetch failures don't break the fetch phase. This is reasonable since prefetch is purely an optimization. Consider whether IOException would be more specific for I/O-related prefetch failures, though the broader catch provides more resilience.


188-231: Well-structured prefetch integration with appropriate safeguards.

The prefetch implementation includes:

  • Setting-based enablement check
  • Scroll query exclusion (documented)
  • Cancellation awareness in the loop
  • Nested document skip logic
  • Clean state reset before main processing

The overall structure is correct and the optimization is properly isolated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @server/src/main/java/org/opensearch/index/IndexSettings.java:
- Around line 1047-1051: The new public accessor isPrefetchDocsEnabled() in
IndexSettings exposes an unannotated API; mark it as experimental (or add
explicit stability docs) to avoid implying permanence. Update the IndexSettings
class by annotating isPrefetchDocsEnabled() (and the analogous accessor
referenced later around the other prefetch docs getter) with your project's
experimental annotation (or add Javadoc stating "Experimental — may change") and
ensure the field prefetchDocsEnabled remains private/volatile while the
Javadoc/annotation clearly documents intended stability.

In @server/src/main/java/org/opensearch/search/fetch/FetchPhase.java:
- Around line 180-227: The prefetch branch skips direct SegmentReader instances
because it only checks "instanceof FilterLeafReader" before unwrapping; fix by
unwrapping any FilterLeafReader layers from currentReaderContext.reader() (use
FilterLeafReader.unwrap) into a LeafReader variable and then check "instanceof
SegmentReader" on that unwrapped reader (then call
segmentReader.getFieldsReader().prefetch(relativeDocId)); update the logic
inside the for-loop in FetchPhase so both wrapped and unwrapped SegmentReader
cases execute the same prefetch path while keeping the existing
SequentialStoredFieldsLeafReader check for sequential-prefetch.
🧹 Nitpick comments (5)
server/src/main/java/org/opensearch/search/DefaultSearchContext.java (2)

229-301: Snapshotting the dynamic setting per-request is fine, but consider documenting it.

prefetchDocsForFetchPhaseEnabled is computed once in the constructor (Line 300), so dynamic updates apply to new searches only. If that’s the intent, a short comment near initialization would prevent “why didn’t my update apply?” confusion.


1300-1305: Make evaluatePrefetchDocsEnabled() private (and consider tightening null checks).

This reads like a local helper; keeping it public increases surface area inside the org.opensearch.search package for no clear benefit.

Proposed diff
-    public boolean evaluatePrefetchDocsEnabled() {
-        if (indexService != null && indexService.getIndexSettings() != null) {
-            return indexService.getIndexSettings().isPrefetchDocsEnabled();
-        }
-        return true;
-    }
+    private boolean evaluatePrefetchDocsEnabled() {
+        // Default to enabled if settings are unavailable (e.g., certain test contexts).
+        if (indexService != null && indexService.getIndexSettings() != null) {
+            return indexService.getIndexSettings().isPrefetchDocsEnabled();
+        }
+        return true;
+    }
server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (3)

180-227: Prefetch should be gated on stored fields actually being needed (avoid wasted work).

createStoredFieldsVisitor(...) can return null (stored fields disabled). In that case, prefetching stored fields is pure overhead.

Proposed diff (simple gate)
-        if (context.isPrefetchDocsEnabled()) {
+        if (context.isPrefetchDocsEnabled() && fieldsVisitor != null) {
             // skipping prefetch for scroll queries
             if (context.scrollContext() == null) {
                 for (int index = 0; index < numberOfDocsToPrefetch; index++) {

221-223: Avoid WARN-per-doc (can get very noisy); consider DEBUG or rate-limit.

At minimum, include docId/segment info in the message and log once per request (or once per segment) after counting failures.


180-180: docs.length >= 10 cutoff for “sequential” mode: OK, but consider making it a named constant.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 939a973 and 34e08a5.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
  • server/src/main/java/org/opensearch/index/IndexSettings.java
  • server/src/main/java/org/opensearch/search/DefaultSearchContext.java
  • server/src/main/java/org/opensearch/search/fetch/FetchPhase.java
  • server/src/main/java/org/opensearch/search/internal/SearchContext.java
  • server/src/test/java/org/opensearch/index/IndexSettingsTests.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
💤 Files with no reviewable changes (1)
  • server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🚧 Files skipped from review as they are similar to previous changes (4)
  • CHANGELOG.md
  • server/src/main/java/org/opensearch/search/internal/SearchContext.java
  • server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T19:23:29.698Z
Learnt from: karenyrx
Repo: opensearch-project/OpenSearch PR: 20335
File: modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/search/SearchSourceBuilderProtoUtils.java:155-172
Timestamp: 2026-01-02T19:23:29.698Z
Learning: The gRPC search API in OpenSearch is marked as "experimental" in official documentation, so changes to proto schemas that remove previously unsupported fields (those throwing UnsupportedOperationException) are not considered breaking changes.

Applied to files:

  • server/src/main/java/org/opensearch/index/IndexSettings.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: detect-breaking-change
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: Mend Security Check
🔇 Additional comments (5)
server/src/main/java/org/opensearch/search/DefaultSearchContext.java (1)

777-780: LGTM: override is simple and avoids repeated settings lookups.

server/src/test/java/org/opensearch/index/IndexSettingsTests.java (1)

1177-1207: LGTM: good minimal coverage for default, explicit, and dynamic updates.

server/src/main/java/org/opensearch/index/IndexSettings.java (2)

1224-1226: LGTM: volatile cache + settings update consumer is the right pattern here.


890-896: The setting naming index.prefetch_docs.fetch_phase.enabled is appropriate and follows OpenSearch's established pattern of feature-specific namespaces. Similar retrieval/execution features like searchable snapshots, segment replication, and soft deletes each maintain their own namespace rather than being consolidated under index.search.*. This approach scales better if additional prefetch settings are added in the future.

server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (1)

38-43: Lucene API compatibility verified — no issues found.

Both StoredFieldsReader#prefetch(int) and SegmentReader#getFieldsReader() are public methods in Lucene 10.3.2 (used by OpenSearch main) and are accessible from application code. The code at lines 207-217 correctly uses these stable public APIs.

Signed-off-by: Sruti Parthiban <partsrut@amazon.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
server/src/main/java/org/opensearch/search/DefaultSearchContext.java (1)

1300-1305: Consider reducing visibility of evaluatePrefetchDocsEnabled().

This method follows the pattern of other evaluate* methods in this class (e.g., evaluateKeywordIndexOrDocValuesEnabled), but unlike some of them, this one appears to only be called from the constructor. Consider making it private for better encapsulation, unless there's a specific reason to keep it public.

♻️ Suggested change
-    public boolean evaluatePrefetchDocsEnabled() {
+    private boolean evaluatePrefetchDocsEnabled() {
server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java (3)

19-128: Tests replicate logic locally instead of testing production code.

testPrefetchThresholdCalculation and testPrefetchLoopBounds test Math.min and loop iteration behavior locally rather than calling actual production methods. These tests won't catch regressions if the production code in FetchPhase changes.

Consider either:

  1. Exposing a production method that encapsulates this logic and testing that, or
  2. Converting to an integration test that exercises the full prefetch flow, or
  3. Removing these if the logic is already covered by integration tests elsewhere.

49-55: Test segment doesn't exercise production code.

Lines 52-55 test local boolean variables instead of production behavior. This provides no regression protection for actual scroll query handling in the prefetch logic.

If ScrollContext is difficult to mock, consider:

  • Using a test subclass or factory that creates a testable scroll context
  • Using Mockito.mockConstruction() for final classes
  • Extracting a method in production code that takes the boolean condition and testing that

130-177: Consider consolidating or removing trivial logic tests.

These three tests validate basic Java operations (integer comparison, subtraction) rather than production code. They document the expected behavior via test names but won't catch production regressions.

A more effective approach would be an integration test that exercises FetchPhase.execute() with various document configurations (nested docs, multiple reader indices, etc.) and verifies the prefetch behavior end-to-end.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34e08a5 and eada73c.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
  • server/src/main/java/org/opensearch/index/IndexSettings.java
  • server/src/main/java/org/opensearch/search/DefaultSearchContext.java
  • server/src/main/java/org/opensearch/search/fetch/FetchPhase.java
  • server/src/main/java/org/opensearch/search/internal/SearchContext.java
  • server/src/test/java/org/opensearch/index/IndexSettingsTests.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java
  • server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
💤 Files with no reviewable changes (1)
  • server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
  • server/src/main/java/org/opensearch/index/IndexSettings.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T19:23:29.698Z
Learnt from: karenyrx
Repo: opensearch-project/OpenSearch PR: 20335
File: modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/search/SearchSourceBuilderProtoUtils.java:155-172
Timestamp: 2026-01-02T19:23:29.698Z
Learning: The gRPC search API in OpenSearch is marked as "experimental" in official documentation, so changes to proto schemas that remove previously unsupported fields (those throwing UnsupportedOperationException) are not considered breaking changes.

Applied to files:

  • server/src/main/java/org/opensearch/search/internal/SearchContext.java
  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: gradle-check
  • GitHub Check: Mend Security Check
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: detect-breaking-change
🔇 Additional comments (12)
CHANGELOG.md (1)

15-15: LGTM!

The changelog entry follows the established format and is correctly placed in the "Added" section.

server/src/main/java/org/opensearch/search/internal/SearchContext.java (1)

602-610: LGTM!

The new experimental API method is well-documented, provides a sensible default, and follows the established pattern for feature flags in SearchContext. The @ExperimentalApi annotation appropriately signals the provisional nature of this feature.

server/src/main/java/org/opensearch/search/DefaultSearchContext.java (3)

229-229: LGTM!

The new field follows the established pattern for configuration flags in this class.


300-300: LGTM!

The field initialization via evaluatePrefetchDocsEnabled() is consistent with how other configuration flags are initialized in the constructor.


777-780: LGTM!

The override correctly returns the cached configuration value.

server/src/test/java/org/opensearch/index/IndexSettingsTests.java (1)

1177-1207: LGTM!

The test coverage is comprehensive, covering the three essential scenarios: default value, explicit configuration, and dynamic updates. The tests follow the established patterns in this test class.

server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (4)

38-42: LGTM!

New imports are required for the prefetch implementation using Lucene's reader APIs.


180-227: Prefetch implementation looks good with a minor suggestion.

The prefetch logic is well-structured:

  • Properly guards with isPrefetchDocsEnabled() and scroll context check
  • Handles cancellation correctly
  • Gracefully degrades on errors with warning log
  • Correctly skips nested docs
  • Optimizes sequential doc access by prefetching only the first doc per segment

One consideration: The prefetchDocsThreshold (1000) is hardcoded. For an experimental feature this is acceptable, but you may want to consider making this configurable in the future if users report different optimal values for their workloads.


226-227: LGTM!

Correctly resets the reader state after the prefetch loop to ensure the main fetch loop starts with a clean state.


243-243: LGTM!

Efficiently reuses the top-level hasSequentialDocs calculation instead of recalculating.

server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java (2)

1-17: LGTM!

Standard test class setup with appropriate imports and base class.


64-99: Good unit test coverage for sequential docs detection.

This test properly exercises FetchPhase.hasSequentialDocs() production code with various scenarios: sequential, non-sequential, and the combined condition with length threshold.

@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for eada73c: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 13, 2026

Codecov Report

❌ Patch coverage is 82.50000% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.25%. Comparing base (6b50fa4) to head (eada73c).
⚠️ Report is 189 commits behind head on main.

Files with missing lines Patch % Lines
...n/java/org/opensearch/search/fetch/FetchPhase.java 78.57% 3 Missing and 3 partials ⚠️
...va/org/opensearch/search/DefaultSearchContext.java 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20176      +/-   ##
============================================
- Coverage     73.29%   73.25%   -0.04%     
- Complexity    71816    71821       +5     
============================================
  Files          5793     5793              
  Lines        328644   328682      +38     
  Branches      47313    47323      +10     
============================================
- Hits         240890   240787     -103     
- Misses        68404    68575     +171     
+ Partials      19350    19320      -30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

int rootId = findRootDocumentIfNested(context, currentReaderContext, relativeDocId);
// skipping prefetch for nested docs as it needs additional processing like
// loading the document source and filtering it based on the nested document ID
if (rootId == -1) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my Mac (with a 16KB page size), when retrieving documents, nested documents and their parent document are quite likely to be contained within the same page. I’m therefore wondering if this check is really required.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkewwei
Default page size is 4KB for Linux
Having the safety check is good for extreme cases anyway.


if (context.isPrefetchDocsEnabled()) {
// skipping prefetch for scroll queries
if (context.scrollContext() == null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do scroll queries not require preloading? Is it because they only run a single time? Logically, scroll queries handle a much more continuous stream of documents, so preloading ought to yield better results.

// loading the document source and filtering it based on the nested document ID
if (rootId == -1) {
// for sequential docs within the same segment we only prefetch the first doc
if (currentReaderContext.reader() instanceof SequentialStoredFieldsLeafReader lf && hasSequentialDocs) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When consecutive documents stretch beyond the scope of a single page_size, loading only one page will result in truncated output. Maybe we don’t have to worry about repeated loading operations, as @abiesps pointed out, Lucene will efficiently handle the quick reloading of the same data block on our behalf.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sruti1312 It might be good to have a test for this scenario ( I think your current tests might already be covering this implicitly)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request lucene Search Search query, autocomplete ...etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] AsyncIO and IO concurrency in Fetch phase

6 participants