Added ability to prefetch docs in fetch phase#20176
Added ability to prefetch docs in fetch phase#20176sruti1312 wants to merge 1 commit intoopensearch-project:mainfrom
Conversation
📝 WalkthroughWalkthroughAdds an experimental dynamic index setting to control prefetching of stored fields during the fetch phase, exposes the flag via SearchContext/DefaultSearchContext, integrates a prefetch loop into FetchPhase that interacts with leaf readers, registers the setting as built-in, and adds unit tests plus a changelog entry. Changes
Sequence Diagram(s)sequenceDiagram
participant IS as IndexSettings
participant SC as SearchContext
participant FP as FetchPhase
participant LR as LeafReader
IS->>SC: publish PREFETCH_DOCS_DURING_FETCH_ENABLED (dynamic)
SC->>SC: cache prefetch flag
FP->>SC: isPrefetchDocsEnabled()?
SC-->>FP: true / false
alt Prefetch enabled
FP->>FP: compute readerIndex, docBases, numberToPrefetch
FP->>LR: prefetch(docIdList) (sequential or segment path)
LR-->>FP: success / error
FP->>FP: log & continue on error
else Prefetch disabled
FP->>FP: skip prefetch
end
FP->>FP: proceed with per-document stored-field fetch loop
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(4 hunks)server/src/main/java/org/opensearch/search/DefaultSearchContext.java(3 hunks)server/src/main/java/org/opensearch/search/fetch/FetchPhase.java(2 hunks)server/src/main/java/org/opensearch/search/internal/SearchContext.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (39)
- GitHub Check: gradle-check
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: Analyze (java)
- GitHub Check: gradle-check
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: Analyze (java)
🔇 Additional comments (4)
server/src/main/java/org/opensearch/search/internal/SearchContext.java (1)
597-604: New prefetch hook onSearchContextis well-scoped and backwards compatibleProviding a concrete, default-false
isPrefetchDocsEnabled()on the abstract base cleanly adds the extension point without breaking existing subclasses; the Javadoc makes the contract clear. No changes needed here.server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java (1)
296-303: Wiring prefetch setting intoBUILT_IN_INDEX_SETTINGSlooks correctAdding
IndexSettings.PREFETCH_DOCS_DURING_FETCH_ENABLEDtoBUILT_IN_INDEX_SETTINGSproperly exposes the index-level toggle through the standard settings machinery and matches the surrounding pattern.server/src/main/java/org/opensearch/search/DefaultSearchContext.java (1)
225-226: Prefetch flag plumbed fromIndexSettingsintoDefaultSearchContextcorrectlyThe dedicated
prefetchDocsForFetchPhaseEnabledfield, initialized fromindexService.getIndexSettings().isPrefetchDocsEnabled()and surfaced viaisPrefetchDocsEnabled(), cleanly ties the index setting into the fetch phase. Given that search contexts are short-lived per request, snapshotting the value in the constructor is reasonable and keeps the override trivial.Also applies to: 295-296, 772-775
server/src/main/java/org/opensearch/index/IndexSettings.java (1)
869-874: Index-level prefetch setting is defined and wired consistently with other dynamic settingsThe new
PREFETCH_DOCS_DURING_FETCH_ENABLEDsetting, cachedprefetchDocsEnabledfield, constructor initialization, andisPrefetchDocsEnabled/setPrefetchDocsEnabledpair all follow the established pattern for dynamic index settings (volatile cache +IndexScopedSettingsupdate consumer). Defaulting totruewithIndexScopeandDynamicis appropriate for a tunable performance feature.Also applies to: 1024-1028, 1196-1202, 2298-2304
server/src/main/java/org/opensearch/search/fetch/FetchPhase.java
Outdated
Show resolved
Hide resolved
|
❌ Gradle check result for e4fc4d6: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| LeafReader leafReader = FilterLeafReader.unwrap(filterLeafReader); | ||
| if (leafReader instanceof SegmentReader segmentReader) { | ||
| segmentReader.getFieldsReader().prefetch(relativeDocId); | ||
| } |
There was a problem hiding this comment.
Should we have some upper bound on how many prefetch calls we make.
We could also batch documents by segments and call prefetch only for one docId per page size for each segment. Let me know your thoughts on this ?
There was a problem hiding this comment.
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java#L624 - I guess, this code is already taking care of it.
There was a problem hiding this comment.
I am also wondering, if there is a way to know this 'prefetching distance' (When to call prefetch before actual IO) and what is its relationship with IO latency (given there is enough memory and compute). Like in this case, you were able to see around 50% improvement in fetch phase execution time by iterating over docId twice.
Do we know why the time hasn't improved further, if there is still IO during the loop of actually fetching doc Ids.
There was a problem hiding this comment.
Actually the granularity of prefetched blocks cache is at clone level of IndexInput (https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java#L624) . Can we validate if this translate to per request scope or per thread.
If its per request, there is a scope of removing redundant prefetch call by having another cache at higher granular level (than request scope).
There was a problem hiding this comment.
Actually the granularity of prefetched blocks cache is at clone level of IndexInput (https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java#L624) . Can we validate if this translate to per request scope or per thread.
Nevermind its at segment reader level.
|
Seems to me the scenario is mainly for a cold start on a slow storage, where page cache is empty, using prefetch API could warm up quicker because of parallel IO. I see the numbers in the description which is nice! Could you also note down the machine, workload and probably some steps to clear page cache before benchmark. Some metrics could also help to show the impact of this change, and useful for operation readiness. iostat could show
page cache
|
| int prefetchReaderIndex = -1; | ||
| LeafReaderContext prefetchReaderContext = null; | ||
| boolean hasSequentialDocs = hasSequentialDocs(docs); | ||
| boolean useSequentialPrefetch = hasSequentialDocs && docs.length >= 10; |
There was a problem hiding this comment.
A thought here on docs.length, when it's very large, probably we also shouldn't do this prefetch, or do this in a batched manner.
I'm little worried early prefetched pages may be evicted from page cache before they are read.
fb7af6a to
4cd56d4
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
CHANGELOG.md (1)
15-15: Align changelog entry verb form with existing entries.The entry uses "Adding ability" while most other entries in the changelog use either the imperative form ("Add") or past tense ("Added"). For consistency, consider changing to "Add support for prefetching docs in fetch phase" or "Added support for prefetching docs in fetch phase".
✍️ Suggested style alignment
-- Adding ability to prefetch docs in fetch phase ([#20176](https://github.com/opensearch-project/OpenSearch/pull/20176)) +- Add support for prefetching docs in fetch phase ([#20176](https://github.com/opensearch-project/OpenSearch/pull/20176))
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
CHANGELOG.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T19:23:29.698Z
Learnt from: karenyrx
Repo: opensearch-project/OpenSearch PR: 20335
File: modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/search/SearchSourceBuilderProtoUtils.java:155-172
Timestamp: 2026-01-02T19:23:29.698Z
Learning: The gRPC search API in OpenSearch is marked as "experimental" in official documentation, so changes to proto schemas that remove previously unsupported fields (those throwing UnsupportedOperationException) are not considered breaking changes.
Applied to files:
CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: gradle-check
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: detect-breaking-change
- GitHub Check: Analyze (java)
- GitHub Check: Mend Security Check
|
❌ Gradle check result for 4cd56d4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
4cd56d4 to
aede52b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @CHANGELOG.md:
- Line 46: The changelog entry "Adding ability to prefetch docs in fetch phase
([#20176](https://github.com/opensearch-project/OpenSearch/pull/20176))" is
missing the required list marker; update the line in the "Added" section by
prefixing it with "- " so it matches other entries (i.e., make the entry start
with a dash and a space).
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (1)
245-247: Redundantdocs.length >= 10check.The
docs.length >= 10condition is already incorporated intohasSequentialDocsat line 180, making the explicit check at line 247 redundant.♻️ Suggested simplification
- if (currentReaderContext.reader() instanceof SequentialStoredFieldsLeafReader lf - && hasSequentialDocs - && docs.length >= 10) { + if (currentReaderContext.reader() instanceof SequentialStoredFieldsLeafReader lf + && hasSequentialDocs) {
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGELOG.mdserver/src/main/java/org/opensearch/common/settings/IndexScopedSettings.javaserver/src/main/java/org/opensearch/index/IndexSettings.javaserver/src/main/java/org/opensearch/search/DefaultSearchContext.javaserver/src/main/java/org/opensearch/search/fetch/FetchPhase.javaserver/src/main/java/org/opensearch/search/internal/SearchContext.javaserver/src/test/java/org/opensearch/index/IndexSettingsTests.javaserver/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.javaserver/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🚧 Files skipped from review as they are similar to previous changes (4)
- server/src/main/java/org/opensearch/search/DefaultSearchContext.java
- server/src/main/java/org/opensearch/search/internal/SearchContext.java
- server/src/test/java/org/opensearch/index/IndexSettingsTests.java
- server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T19:23:29.698Z
Learnt from: karenyrx
Repo: opensearch-project/OpenSearch PR: 20335
File: modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/search/SearchSourceBuilderProtoUtils.java:155-172
Timestamp: 2026-01-02T19:23:29.698Z
Learning: The gRPC search API in OpenSearch is marked as "experimental" in official documentation, so changes to proto schemas that remove previously unsupported fields (those throwing UnsupportedOperationException) are not considered breaking changes.
Applied to files:
server/src/main/java/org/opensearch/index/IndexSettings.javaCHANGELOG.md
🪛 LanguageTool
CHANGELOG.md
[grammar] ~19-~19: Use a hyphen to join words.
Context: ...tRangeQuery`. - Add a mapper for context aware segments grouping criteria ([#1923...
(QB_NEW_EN_HYPHEN)
[uncategorized] ~21-~21: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...568)) - Add support for repository with Server side encryption enabled and client side encr...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[grammar] ~21-~21: Use a hyphen to join words.
Context: ...erver side encryption enabled and client side encryption as well based on a flag....
(QB_NEW_EN_HYPHEN)
[uncategorized] ~38-~38: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...enSearch/pull/19878)) - Add support for context aware segments ([#19098](https://github.com/o...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: gradle-check
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: Analyze (java)
- GitHub Check: detect-breaking-change
- GitHub Check: Mend Security Check
🔇 Additional comments (7)
server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java (1)
301-301: LGTM!The new
PREFETCH_DOCS_DURING_FETCH_ENABLEDsetting is correctly registered inBUILT_IN_INDEX_SETTINGS, enabling it to be recognized as a valid index-scoped setting.server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (2)
38-42: LGTM!The new Lucene imports are required for the prefetch logic to unwrap readers and access the
StoredFieldsReader.
180-229: Prefetch logic implementation looks sound.The prefetch implementation correctly:
- Skips scroll queries to avoid interfering with cached sequential readers
- Handles cancellation checks
- Skips nested documents which require additional source filtering
- Prefetches only the first doc per segment for sequential access patterns
- Uses defensive error handling to avoid breaking the fetch phase
A few observations:
The
currentReaderIndex != readerIndexcheck at line 207 is intentionally performed before updatingcurrentReaderIndexat line 220, ensuring only the first document per segment triggers prefetch for sequential access - this is correct.The exception handling at lines 221-223 is appropriately defensive for a performance optimization.
server/src/main/java/org/opensearch/index/IndexSettings.java (3)
890-896: LGTM!The new experimental setting is well-defined:
@ExperimentalApiannotation appropriately marks this as experimental- Default
trueenables the optimization by default for improved cold-start latencyDynamicproperty allows runtime configuration changes- Setting key
index.prefetch_docs.fetch_phase.enabledfollows naming conventions
1047-1050: LGTM!The field is correctly declared as
volatilefor thread-safe access with dynamic setting updates. The initialization and update consumer registration follow the established pattern used by other dynamic settings in this class.Also applies to: 1224-1225
2328-2335: LGTM!The getter/setter pair follows the established pattern for dynamic settings. The public getter provides access to search context, while the private setter is correctly used only by the settings update consumer.
server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java (1)
17-178: Good unit test coverage for prefetch logic components.The tests effectively validate the individual decision points in the prefetch implementation:
- Threshold calculations with various doc array sizes
- Enablement conditions including scroll query handling
- Sequential docs detection via
FetchPhase.hasSequentialDocs()- Loop bounds verification
- Nested document skip logic
- Reader index change detection
- Relative doc ID calculations
The workaround for avoiding mocking
ScrollContext(lines 49-55) is reasonable given it's a final class.
aede52b to
a59789f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @server/src/main/java/org/opensearch/search/fetch/FetchPhase.java:
- Around line 205-219: The else-branch in FetchPhase incorrectly prefetches for
SequentialStoredFieldsLeafReader when currentReaderIndex == readerIndex,
defeating the "only prefetch the first doc per segment" optimization; update the
guard so the else block is skipped for sequential docs in the same segment by
adding a condition preventing execution when currentReaderContext.reader() is a
SequentialStoredFieldsLeafReader and hasSequentialDocs and currentReaderIndex ==
readerIndex (or equivalently extend the original if to cover this case),
ensuring only the first doc per segment triggers prefetch.
🧹 Nitpick comments (4)
server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java (1)
173-183: Tests are tautological and don't verify actual behavior.These tests mock
isPrefetchDocsEnabled()to return a specific value and then assert that same value is returned. This only verifies that Mockito stubbing works, not the actual feature behavior.To meaningfully test the prefetch setting:
- Test
SearchContext's default implementation returnstruewithout mocking- Test
DefaultSearchContextwith realIndexSettingsto verify it respects the index-level settingConsider refactoring to test actual behavior rather than mock stubbing.
♻️ Suggested approach for meaningful tests
public void testPrefetchDocsEnabledByDefault() { - SearchContext context = mock(SearchContext.class); - when(context.isPrefetchDocsEnabled()).thenReturn(true); - assertTrue(context.isPrefetchDocsEnabled()); + // Test the actual default implementation in SearchContext base class + // This would require a concrete test implementation or testing DefaultSearchContext + // with properly configured IndexSettings where the setting defaults to true }Alternatively, test using a spy on a real instance or use
TestSearchContextconfigured with appropriate settings.server/src/main/java/org/opensearch/search/DefaultSearchContext.java (1)
1300-1305: Fallback value differs from base class default.When
indexServiceorindexSettingsis null, this returnsfalse, but the baseSearchContext.isPrefetchDocsEnabled()returnstrue. While this defensive null-check follows the pattern of similar methods (e.g.,evaluateKeywordIndexOrDocValuesEnabled), the inconsistency could cause unexpected behavior if the fallback is ever triggered.Consider whether returning
true(matching the base class default) would be more appropriate for consistency, or document whyfalseis the correct fallback here.server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (2)
180-186: Consider makingprefetchDocsThresholdconfigurable.The threshold is hardcoded to 1000. Since the prefetch feature itself is controlled by an index-level setting (
PREFETCH_DOCS_DURING_FETCH_ENABLED), consider exposing the threshold as a configurable setting as well to allow tuning based on storage characteristics and workload patterns.Additionally,
docs.length >= 10is checked here inhasSequentialDocsand again at line 247 in the main loop—the latter check is redundant.
245-247: Redundant check:docs.length >= 10is already included inhasSequentialDocs.At line 180,
hasSequentialDocsis defined ashasSequentialDocs(docs) && docs.length >= 10. The additional&& docs.length >= 10check at line 247 is therefore redundant and can be removed.Suggested simplification
if (currentReaderContext.reader() instanceof SequentialStoredFieldsLeafReader lf - && hasSequentialDocs - && docs.length >= 10) { + && hasSequentialDocs) {
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGELOG.mdserver/src/main/java/org/opensearch/common/settings/IndexScopedSettings.javaserver/src/main/java/org/opensearch/index/IndexSettings.javaserver/src/main/java/org/opensearch/search/DefaultSearchContext.javaserver/src/main/java/org/opensearch/search/fetch/FetchPhase.javaserver/src/main/java/org/opensearch/search/internal/SearchContext.javaserver/src/test/java/org/opensearch/index/IndexSettingsTests.javaserver/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.javaserver/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🚧 Files skipped from review as they are similar to previous changes (3)
- server/src/test/java/org/opensearch/index/IndexSettingsTests.java
- server/src/main/java/org/opensearch/index/IndexSettings.java
- server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T19:23:29.698Z
Learnt from: karenyrx
Repo: opensearch-project/OpenSearch PR: 20335
File: modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/search/SearchSourceBuilderProtoUtils.java:155-172
Timestamp: 2026-01-02T19:23:29.698Z
Learning: The gRPC search API in OpenSearch is marked as "experimental" in official documentation, so changes to proto schemas that remove previously unsupported fields (those throwing UnsupportedOperationException) are not considered breaking changes.
Applied to files:
CHANGELOG.mdserver/src/main/java/org/opensearch/search/internal/SearchContext.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: gradle-check
- GitHub Check: detect-breaking-change
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: Analyze (java)
- GitHub Check: Mend Security Check
🔇 Additional comments (7)
CHANGELOG.md (1)
15-15: LGTM!The changelog entry is properly formatted and correctly placed in the "Added" section for the unreleased 3.x version.
server/src/main/java/org/opensearch/search/internal/SearchContext.java (1)
602-610: LGTM!The new experimental API method follows the established pattern in
SearchContextfor optional override methods (similar toincludeNamedQueriesScore()andshouldUseConcurrentSearch()). The defaulttruevalue aligns with the feature being enabled by default, and the@ExperimentalApiannotation is appropriate for a new feature.server/src/main/java/org/opensearch/search/DefaultSearchContext.java (2)
229-229: LGTM!The field declaration and initialization follow the established pattern used by other cached settings in this class (e.g.,
keywordIndexOrDocValuesEnabled,maxAggRewriteFilters).Also applies to: 300-300
777-780: LGTM!Clean override that returns the cached value, consistent with the class's pattern for similar settings.
server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java (1)
301-301: LGTM!The new setting is correctly registered in
BUILT_IN_INDEX_SETTINGS, making it available as a recognized index-level setting.server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (2)
38-42: LGTM!The new Lucene imports are necessary for the prefetch logic to unwrap reader wrappers and access the underlying
SegmentReaderfor stored field prefetching.
228-230: LGTM!Correctly resetting
currentReaderIndexandcurrentReaderContextbefore the main processing loop ensures the prefetch loop's state doesn't interfere with the actual document fetching logic.
e38ec54 to
939a973
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @server/src/main/java/org/opensearch/search/fetch/FetchPhase.java:
- Around line 206-218: The code incorrectly prefetches when currentReaderIndex
== readerIndex and hasSequentialDocs is true; change the branching so that when
we're in the same segment (currentReaderIndex == readerIndex) AND
hasSequentialDocs is true we skip prefetch entirely. Concretely, update the
condition around SequentialStoredFieldsLeafReader/FilterLeafReader so it only
attempts prefetch when either the reader index differs (currentReaderIndex !=
readerIndex) OR there are NOT sequential docs (hasSequentialDocs == false);
leave the existing calls to
SequentialStoredFieldsLeafReader.getSequentialStoredFieldsReader().prefetch(relativeDocId)
and SegmentReader.getFieldsReader().prefetch(relativeDocId) unchanged.
🧹 Nitpick comments (5)
server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (2)
180-186: Consider extracting magic numbers as named constants.The thresholds
10(minimum docs for sequential optimization) and1000(prefetch threshold) are hardcoded. Extracting them as named constants improves readability and maintainability.Suggested refactor
+ private static final int SEQUENTIAL_DOCS_MIN_COUNT = 10; + private static final int PREFETCH_DOCS_THRESHOLD = 1000; + public void execute(SearchContext context, String profileDescription) { // ... - boolean hasSequentialDocs = hasSequentialDocs(docs) && docs.length >= 10; + boolean hasSequentialDocs = hasSequentialDocs(docs) && docs.length >= SEQUENTIAL_DOCS_MIN_COUNT; - int prefetchDocsThreshold = 1000; - int numberOfDocsToPrefetch = Math.min(docs.length, prefetchDocsThreshold); + int numberOfDocsToPrefetch = Math.min(docs.length, PREFETCH_DOCS_THRESHOLD);
221-223: Catching generic Exception may mask programming errors.While swallowing prefetch failures is acceptable since it's an optimization, catching all
Exceptiontypes could hide bugs likeNullPointerExceptionorIllegalStateException. Consider catching more specific exceptions likeIOException.Suggested refactor
- } catch (Exception e) { + } catch (IOException e) { LOGGER.warn("Failed to prefetch the doc, will continue to prefetch the other docs", e); }server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java (3)
19-37: Tests duplicate Java's Math.min behavior rather than FetchPhase logic.This test validates
Math.minrather than any FetchPhase-specific code. Consider either removing it or replacing it with a test that exercises the actual prefetch threshold logic withinFetchPhase.execute().
101-128: Test verifies for-loop behavior rather than FetchPhase implementation.This test counts iterations in a standalone loop, which doesn't exercise FetchPhase code. Consider replacing with a test that verifies the prefetch behavior through FetchPhase or marks for removal.
17-178: Consider adding integration tests for prefetch behavior.The current tests validate individual arithmetic/comparison operations but don't exercise the actual prefetch logic with mocked readers. Integration tests covering scenarios like:
- Prefetch called on first doc of each segment with sequential docs
- Prefetch not called for nested documents
- Exception handling during prefetch doesn't break fetch phase
- Behavior when SequentialStoredFieldsLeafReader vs SegmentReader is used
would provide more meaningful coverage.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGELOG.mdserver/src/main/java/org/opensearch/common/settings/IndexScopedSettings.javaserver/src/main/java/org/opensearch/index/IndexSettings.javaserver/src/main/java/org/opensearch/search/DefaultSearchContext.javaserver/src/main/java/org/opensearch/search/fetch/FetchPhase.javaserver/src/main/java/org/opensearch/search/internal/SearchContext.javaserver/src/test/java/org/opensearch/index/IndexSettingsTests.javaserver/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.javaserver/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
💤 Files with no reviewable changes (1)
- server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🚧 Files skipped from review as they are similar to previous changes (1)
- server/src/main/java/org/opensearch/search/DefaultSearchContext.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T19:23:29.698Z
Learnt from: karenyrx
Repo: opensearch-project/OpenSearch PR: 20335
File: modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/search/SearchSourceBuilderProtoUtils.java:155-172
Timestamp: 2026-01-02T19:23:29.698Z
Learning: The gRPC search API in OpenSearch is marked as "experimental" in official documentation, so changes to proto schemas that remove previously unsupported fields (those throwing UnsupportedOperationException) are not considered breaking changes.
Applied to files:
server/src/main/java/org/opensearch/index/IndexSettings.javaserver/src/main/java/org/opensearch/search/internal/SearchContext.javaCHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: gradle-check
- GitHub Check: Mend Security Check
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: detect-breaking-change
- GitHub Check: Analyze (java)
🔇 Additional comments (14)
CHANGELOG.md (1)
15-15: LGTM!The changelog entry is correctly formatted, follows the existing pattern, and is appropriately placed under the "Added" section for unreleased 3.x changes.
server/src/main/java/org/opensearch/index/IndexSettings.java (5)
39-39: LGTM!Import correctly added for the
@ExperimentalApiannotation used on the new setting.
890-896: LGTM!The experimental setting is well-defined:
- Appropriate
@ExperimentalApiannotation for the new feature.- Default
trueenables the optimization by default as intended.Dynamicproperty allows operators to toggle at runtime without index recreation.
1047-1050: LGTM!The field declaration is correct:
volatilemodifier ensures visibility across threads for dynamic updates.- Follows the established pattern for dynamic index settings in this class.
1224-1225: LGTM!Initialization and dynamic update registration follow the established pattern for index settings.
2328-2335: LGTM!Getter and setter implementation is correct:
- Public getter for external access.
- Private setter restricts modification to the settings update consumer.
server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java (1)
301-301: LGTM!The setting is correctly registered in
BUILT_IN_INDEX_SETTINGS, enabling it to be recognized and validated by the index settings infrastructure.server/src/test/java/org/opensearch/index/IndexSettingsTests.java (3)
1176-1181: LGTM!Test correctly validates the default value of
truefor the prefetch docs setting.
1183-1190: LGTM!Test correctly validates that explicit configuration of
falseoverrides the default.
1192-1207: LGTM!Test comprehensively validates dynamic update behavior:
- Verifies default value.
- Tests disabling via dynamic update.
- Tests re-enabling via dynamic update.
This follows the established pattern for dynamic setting tests in this file.
server/src/main/java/org/opensearch/search/internal/SearchContext.java (1)
602-610: LGTM!The base implementation with default
trueis appropriate:
- Aligns with the setting's default value.
@ExperimentalApiannotation matches the setting definition.DefaultSearchContextcorrectly overrides this method to returnprefetchDocsForFetchPhaseEnabled, which respects the index-level setting.- The method is properly used in
FetchPhaseto control prefetching behavior during the fetch phase.server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (2)
228-229: LGTM!Correctly resetting reader state before the main processing loop ensures the prefetch loop's state doesn't leak into the fetch logic.
245-246: LGTM!The condition now consistently uses the pre-computed
hasSequentialDocsflag, aligning with the prefetch logic.server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java (1)
64-99: Good coverage of hasSequentialDocs method.This test properly exercises the public
FetchPhase.hasSequentialDocs()method with various inputs including the combined condition with the length threshold.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @server/src/main/java/org/opensearch/search/fetch/FetchPhase.java:
- Around line 212-219: The non-sequential prefetch branch in FetchPhase
currently only handles a FilterLeafReader and misses the case where the reader
is already a SegmentReader; update the logic to defensively obtain a
SegmentReader (either by unwrapping a FilterLeafReader via
FilterLeafReader.unwrap(filterLeafReader) or by using the
Lucene.segmentReader(currentReaderContext.reader()) helper) and, if non-null,
call segmentReader.getFieldsReader().prefetch(relativeDocId) so both wrapped and
unwrapped SegmentReader instances get the prefetch call.
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (1)
182-186: Consider making the prefetch threshold configurable in the future.The hardcoded threshold of 1000 is reasonable for most use cases, but consider extracting this to a configurable setting for tuning in production environments with different workload characteristics.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGELOG.mdserver/src/main/java/org/opensearch/common/settings/IndexScopedSettings.javaserver/src/main/java/org/opensearch/index/IndexSettings.javaserver/src/main/java/org/opensearch/search/DefaultSearchContext.javaserver/src/main/java/org/opensearch/search/fetch/FetchPhase.javaserver/src/main/java/org/opensearch/search/internal/SearchContext.javaserver/src/test/java/org/opensearch/index/IndexSettingsTests.javaserver/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.javaserver/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
💤 Files with no reviewable changes (1)
- server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🚧 Files skipped from review as they are similar to previous changes (6)
- server/src/main/java/org/opensearch/index/IndexSettings.java
- CHANGELOG.md
- server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java
- server/src/main/java/org/opensearch/search/DefaultSearchContext.java
- server/src/main/java/org/opensearch/search/internal/SearchContext.java
- server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: gradle-check
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: Analyze (java)
🔇 Additional comments (4)
server/src/test/java/org/opensearch/index/IndexSettingsTests.java (1)
1177-1207: LGTM!The tests for the prefetch docs setting are well-structured and follow established patterns in this test file. Coverage includes:
- Default value verification
- Explicit value override
- Dynamic update behavior (both enable and disable paths)
server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (3)
180-180: LGTM!Computing
hasSequentialDocsonce at the top level and reusing it for both the prefetch loop and the main processing loop is a good optimization. The threshold of 10 documents is reasonable to avoid prefetch overhead for small result sets.
222-224: Exception handling is appropriately defensive for an optimization.Catching broad
Exceptionand logging a warning ensures prefetch failures don't break the fetch phase. This is reasonable since prefetch is purely an optimization. Consider whetherIOExceptionwould be more specific for I/O-related prefetch failures, though the broader catch provides more resilience.
188-231: Well-structured prefetch integration with appropriate safeguards.The prefetch implementation includes:
- Setting-based enablement check
- Scroll query exclusion (documented)
- Cancellation awareness in the loop
- Nested document skip logic
- Clean state reset before main processing
The overall structure is correct and the optimization is properly isolated.
939a973 to
34e08a5
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @server/src/main/java/org/opensearch/index/IndexSettings.java:
- Around line 1047-1051: The new public accessor isPrefetchDocsEnabled() in
IndexSettings exposes an unannotated API; mark it as experimental (or add
explicit stability docs) to avoid implying permanence. Update the IndexSettings
class by annotating isPrefetchDocsEnabled() (and the analogous accessor
referenced later around the other prefetch docs getter) with your project's
experimental annotation (or add Javadoc stating "Experimental — may change") and
ensure the field prefetchDocsEnabled remains private/volatile while the
Javadoc/annotation clearly documents intended stability.
In @server/src/main/java/org/opensearch/search/fetch/FetchPhase.java:
- Around line 180-227: The prefetch branch skips direct SegmentReader instances
because it only checks "instanceof FilterLeafReader" before unwrapping; fix by
unwrapping any FilterLeafReader layers from currentReaderContext.reader() (use
FilterLeafReader.unwrap) into a LeafReader variable and then check "instanceof
SegmentReader" on that unwrapped reader (then call
segmentReader.getFieldsReader().prefetch(relativeDocId)); update the logic
inside the for-loop in FetchPhase so both wrapped and unwrapped SegmentReader
cases execute the same prefetch path while keeping the existing
SequentialStoredFieldsLeafReader check for sequential-prefetch.
🧹 Nitpick comments (5)
server/src/main/java/org/opensearch/search/DefaultSearchContext.java (2)
229-301: Snapshotting the dynamic setting per-request is fine, but consider documenting it.
prefetchDocsForFetchPhaseEnabledis computed once in the constructor (Line 300), so dynamic updates apply to new searches only. If that’s the intent, a short comment near initialization would prevent “why didn’t my update apply?” confusion.
1300-1305: MakeevaluatePrefetchDocsEnabled()private (and consider tightening null checks).This reads like a local helper; keeping it
publicincreases surface area inside theorg.opensearch.searchpackage for no clear benefit.Proposed diff
- public boolean evaluatePrefetchDocsEnabled() { - if (indexService != null && indexService.getIndexSettings() != null) { - return indexService.getIndexSettings().isPrefetchDocsEnabled(); - } - return true; - } + private boolean evaluatePrefetchDocsEnabled() { + // Default to enabled if settings are unavailable (e.g., certain test contexts). + if (indexService != null && indexService.getIndexSettings() != null) { + return indexService.getIndexSettings().isPrefetchDocsEnabled(); + } + return true; + }server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (3)
180-227: Prefetch should be gated on stored fields actually being needed (avoid wasted work).
createStoredFieldsVisitor(...)can returnnull(stored fields disabled). In that case, prefetching stored fields is pure overhead.Proposed diff (simple gate)
- if (context.isPrefetchDocsEnabled()) { + if (context.isPrefetchDocsEnabled() && fieldsVisitor != null) { // skipping prefetch for scroll queries if (context.scrollContext() == null) { for (int index = 0; index < numberOfDocsToPrefetch; index++) {
221-223: Avoid WARN-per-doc (can get very noisy); consider DEBUG or rate-limit.At minimum, include
docId/segment info in the message and log once per request (or once per segment) after counting failures.
180-180:docs.length >= 10cutoff for “sequential” mode: OK, but consider making it a named constant.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGELOG.mdserver/src/main/java/org/opensearch/common/settings/IndexScopedSettings.javaserver/src/main/java/org/opensearch/index/IndexSettings.javaserver/src/main/java/org/opensearch/search/DefaultSearchContext.javaserver/src/main/java/org/opensearch/search/fetch/FetchPhase.javaserver/src/main/java/org/opensearch/search/internal/SearchContext.javaserver/src/test/java/org/opensearch/index/IndexSettingsTests.javaserver/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.javaserver/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
💤 Files with no reviewable changes (1)
- server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🚧 Files skipped from review as they are similar to previous changes (4)
- CHANGELOG.md
- server/src/main/java/org/opensearch/search/internal/SearchContext.java
- server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
- server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T19:23:29.698Z
Learnt from: karenyrx
Repo: opensearch-project/OpenSearch PR: 20335
File: modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/search/SearchSourceBuilderProtoUtils.java:155-172
Timestamp: 2026-01-02T19:23:29.698Z
Learning: The gRPC search API in OpenSearch is marked as "experimental" in official documentation, so changes to proto schemas that remove previously unsupported fields (those throwing UnsupportedOperationException) are not considered breaking changes.
Applied to files:
server/src/main/java/org/opensearch/index/IndexSettings.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: gradle-check
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: Analyze (java)
- GitHub Check: detect-breaking-change
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: Mend Security Check
🔇 Additional comments (5)
server/src/main/java/org/opensearch/search/DefaultSearchContext.java (1)
777-780: LGTM: override is simple and avoids repeated settings lookups.server/src/test/java/org/opensearch/index/IndexSettingsTests.java (1)
1177-1207: LGTM: good minimal coverage for default, explicit, and dynamic updates.server/src/main/java/org/opensearch/index/IndexSettings.java (2)
1224-1226: LGTM: volatile cache + settings update consumer is the right pattern here.
890-896: The setting namingindex.prefetch_docs.fetch_phase.enabledis appropriate and follows OpenSearch's established pattern of feature-specific namespaces. Similar retrieval/execution features like searchable snapshots, segment replication, and soft deletes each maintain their own namespace rather than being consolidated underindex.search.*. This approach scales better if additional prefetch settings are added in the future.server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (1)
38-43: Lucene API compatibility verified — no issues found.Both
StoredFieldsReader#prefetch(int)andSegmentReader#getFieldsReader()are public methods in Lucene 10.3.2 (used by OpenSearch main) and are accessible from application code. The code at lines 207-217 correctly uses these stable public APIs.
Signed-off-by: Sruti Parthiban <partsrut@amazon.com>
34e08a5 to
eada73c
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
server/src/main/java/org/opensearch/search/DefaultSearchContext.java (1)
1300-1305: Consider reducing visibility ofevaluatePrefetchDocsEnabled().This method follows the pattern of other
evaluate*methods in this class (e.g.,evaluateKeywordIndexOrDocValuesEnabled), but unlike some of them, this one appears to only be called from the constructor. Consider making itprivatefor better encapsulation, unless there's a specific reason to keep itpublic.♻️ Suggested change
- public boolean evaluatePrefetchDocsEnabled() { + private boolean evaluatePrefetchDocsEnabled() {server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java (3)
19-128: Tests replicate logic locally instead of testing production code.
testPrefetchThresholdCalculationandtestPrefetchLoopBoundstestMath.minand loop iteration behavior locally rather than calling actual production methods. These tests won't catch regressions if the production code inFetchPhasechanges.Consider either:
- Exposing a production method that encapsulates this logic and testing that, or
- Converting to an integration test that exercises the full prefetch flow, or
- Removing these if the logic is already covered by integration tests elsewhere.
49-55: Test segment doesn't exercise production code.Lines 52-55 test local boolean variables instead of production behavior. This provides no regression protection for actual scroll query handling in the prefetch logic.
If
ScrollContextis difficult to mock, consider:
- Using a test subclass or factory that creates a testable scroll context
- Using
Mockito.mockConstruction()for final classes- Extracting a method in production code that takes the boolean condition and testing that
130-177: Consider consolidating or removing trivial logic tests.These three tests validate basic Java operations (integer comparison, subtraction) rather than production code. They document the expected behavior via test names but won't catch production regressions.
A more effective approach would be an integration test that exercises
FetchPhase.execute()with various document configurations (nested docs, multiple reader indices, etc.) and verifies the prefetch behavior end-to-end.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGELOG.mdserver/src/main/java/org/opensearch/common/settings/IndexScopedSettings.javaserver/src/main/java/org/opensearch/index/IndexSettings.javaserver/src/main/java/org/opensearch/search/DefaultSearchContext.javaserver/src/main/java/org/opensearch/search/fetch/FetchPhase.javaserver/src/main/java/org/opensearch/search/internal/SearchContext.javaserver/src/test/java/org/opensearch/index/IndexSettingsTests.javaserver/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.javaserver/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
💤 Files with no reviewable changes (1)
- server/src/test/java/org/opensearch/search/fetch/FetchPhaseTests.java
🚧 Files skipped from review as they are similar to previous changes (2)
- server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
- server/src/main/java/org/opensearch/index/IndexSettings.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T19:23:29.698Z
Learnt from: karenyrx
Repo: opensearch-project/OpenSearch PR: 20335
File: modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/search/SearchSourceBuilderProtoUtils.java:155-172
Timestamp: 2026-01-02T19:23:29.698Z
Learning: The gRPC search API in OpenSearch is marked as "experimental" in official documentation, so changes to proto schemas that remove previously unsupported fields (those throwing UnsupportedOperationException) are not considered breaking changes.
Applied to files:
server/src/main/java/org/opensearch/search/internal/SearchContext.javaCHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: gradle-check
- GitHub Check: Mend Security Check
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: Analyze (java)
- GitHub Check: detect-breaking-change
🔇 Additional comments (12)
CHANGELOG.md (1)
15-15: LGTM!The changelog entry follows the established format and is correctly placed in the "Added" section.
server/src/main/java/org/opensearch/search/internal/SearchContext.java (1)
602-610: LGTM!The new experimental API method is well-documented, provides a sensible default, and follows the established pattern for feature flags in
SearchContext. The@ExperimentalApiannotation appropriately signals the provisional nature of this feature.server/src/main/java/org/opensearch/search/DefaultSearchContext.java (3)
229-229: LGTM!The new field follows the established pattern for configuration flags in this class.
300-300: LGTM!The field initialization via
evaluatePrefetchDocsEnabled()is consistent with how other configuration flags are initialized in the constructor.
777-780: LGTM!The override correctly returns the cached configuration value.
server/src/test/java/org/opensearch/index/IndexSettingsTests.java (1)
1177-1207: LGTM!The test coverage is comprehensive, covering the three essential scenarios: default value, explicit configuration, and dynamic updates. The tests follow the established patterns in this test class.
server/src/main/java/org/opensearch/search/fetch/FetchPhase.java (4)
38-42: LGTM!New imports are required for the prefetch implementation using Lucene's reader APIs.
180-227: Prefetch implementation looks good with a minor suggestion.The prefetch logic is well-structured:
- Properly guards with
isPrefetchDocsEnabled()and scroll context check- Handles cancellation correctly
- Gracefully degrades on errors with warning log
- Correctly skips nested docs
- Optimizes sequential doc access by prefetching only the first doc per segment
One consideration: The
prefetchDocsThreshold(1000) is hardcoded. For an experimental feature this is acceptable, but you may want to consider making this configurable in the future if users report different optimal values for their workloads.
226-227: LGTM!Correctly resets the reader state after the prefetch loop to ensure the main fetch loop starts with a clean state.
243-243: LGTM!Efficiently reuses the top-level
hasSequentialDocscalculation instead of recalculating.server/src/test/java/org/opensearch/search/fetch/FetchPhasePrefetchTests.java (2)
1-17: LGTM!Standard test class setup with appropriate imports and base class.
64-99: Good unit test coverage for sequential docs detection.This test properly exercises
FetchPhase.hasSequentialDocs()production code with various scenarios: sequential, non-sequential, and the combined condition with length threshold.
|
❕ Gradle check result for eada73c: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #20176 +/- ##
============================================
- Coverage 73.29% 73.25% -0.04%
- Complexity 71816 71821 +5
============================================
Files 5793 5793
Lines 328644 328682 +38
Branches 47313 47323 +10
============================================
- Hits 240890 240787 -103
- Misses 68404 68575 +171
+ Partials 19350 19320 -30 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| int rootId = findRootDocumentIfNested(context, currentReaderContext, relativeDocId); | ||
| // skipping prefetch for nested docs as it needs additional processing like | ||
| // loading the document source and filtering it based on the nested document ID | ||
| if (rootId == -1) { |
There was a problem hiding this comment.
On my Mac (with a 16KB page size), when retrieving documents, nested documents and their parent document are quite likely to be contained within the same page. I’m therefore wondering if this check is really required.
There was a problem hiding this comment.
@kkewwei
Default page size is 4KB for Linux
Having the safety check is good for extreme cases anyway.
|
|
||
| if (context.isPrefetchDocsEnabled()) { | ||
| // skipping prefetch for scroll queries | ||
| if (context.scrollContext() == null) { |
There was a problem hiding this comment.
Why do scroll queries not require preloading? Is it because they only run a single time? Logically, scroll queries handle a much more continuous stream of documents, so preloading ought to yield better results.
| // loading the document source and filtering it based on the nested document ID | ||
| if (rootId == -1) { | ||
| // for sequential docs within the same segment we only prefetch the first doc | ||
| if (currentReaderContext.reader() instanceof SequentialStoredFieldsLeafReader lf && hasSequentialDocs) { |
There was a problem hiding this comment.
When consecutive documents stretch beyond the scope of a single page_size, loading only one page will result in truncated output. Maybe we don’t have to worry about repeated loading operations, as @abiesps pointed out, Lucene will efficiently handle the quick reloading of the same data block on our behalf.
There was a problem hiding this comment.
@sruti1312 It might be good to have a test for this scenario ( I think your current tests might already be covering this implicitly)
Description
Adding ability to prefetch docs in fetch phase
Related Issues
Resolves #18780
Check List
Performance result for
Fetch Phase Cost (Arthas) for size as 10:
For different queries,
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
New Features
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.