Skip to content

Lazy init stored field reader in SourceLookup#20827

Merged
rishabhmaurya merged 3 commits intoopensearch-project:mainfrom
bowenlan-amzn:lazy-get-merge-instance
Mar 23, 2026
Merged

Lazy init stored field reader in SourceLookup#20827
rishabhmaurya merged 3 commits intoopensearch-project:mainfrom
bowenlan-amzn:lazy-get-merge-instance

Conversation

@bowenlan-amzn
Copy link
Copy Markdown
Member

@bowenlan-amzn bowenlan-amzn commented Mar 10, 2026

Description

Defers getMergeInstance() from setSegmentAndDocument() to loadSourceIfNeeded(). Scripts that don't read _source never trigger the getMergeInstance() path:

// Before: eager — triggers madvise on every segment transition
if (this.reader != context.reader()) {
    fieldReader = lf.getSequentialStoredFieldsReader()::document;
    // → getMergeInstance() → madvise(SEQUENTIAL) — WRITE lock
}

// After: lazy — only triggers when _source is actually read
if (this.reader != context.reader()) {
    this.reader = context.reader();
    this.fieldReader = null;  // deferred to loadSourceIfNeeded()
}

Related Issues

Resolves #20933

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 10, 2026

PR Reviewer Guide 🔍

(Review updated until commit c575d0a)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Resource Leak

When reader is a SequentialStoredFieldsLeafReader, getSequentialStoredFieldsReader() returns a new reader instance (a merge instance) that may hold resources. In the old code, this was called once per segment transition. In the new lazy code, if setSegmentAndDocument is called multiple times with the same reader but different documents (e.g., same segment, multiple docs), fieldReader is initialized once and reused — that part is fine. However, the merge instance obtained via getSequentialStoredFieldsReader() is never explicitly closed. If the underlying implementation allocates I/O resources or file handles for the merge reader, they will not be released until GC. This was also true before, but worth confirming the lifecycle of the object returned by getSequentialStoredFieldsReader() is safe to leave unclosed.

if (fieldReader == null) {
    if (reader instanceof SequentialStoredFieldsLeafReader lf) {
        fieldReader = lf.getSequentialStoredFieldsReader()::document;
    } else {
        fieldReader = reader.storedFields()::document;
    }
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 10, 2026

PR Code Suggestions ✨

Latest suggestions up to c575d0a

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null reader before lazy init

When reader is null (e.g., if setSegmentAndDocument was never called), accessing
reader here will throw a NullPointerException. A null-check or guard should be added
to produce a clearer error message, or the code should ensure reader is always set
before loadSourceIfNeeded() is called.

server/src/main/java/org/opensearch/search/lookup/SourceLookup.java [107-113]

 if (fieldReader == null) {
+    if (reader == null) {
+        throw new IllegalStateException("setSegmentAndDocument must be called before loadSourceIfNeeded");
+    }
     if (reader instanceof SequentialStoredFieldsLeafReader lf) {
         fieldReader = lf.getSequentialStoredFieldsReader()::document;
     } else {
         fieldReader = reader.storedFields()::document;
     }
 }
Suggestion importance[1-10]: 3

__

Why: While technically valid, reader being null here would only occur if loadSourceIfNeeded() is called without first calling setSegmentAndDocument(), which is a misuse of the API. The existing code would already throw a NullPointerException making the issue apparent, so this guard adds minimal practical value.

Low
General
Clarify reuse intent of sequential field reader

The fieldReader is only reset when the reader changes, but not when the reader stays
the same and only the docId changes. For SequentialStoredFieldsLeafReader, the
sequential reader obtained via getSequentialStoredFieldsReader() may maintain
internal state tied to document ordering. If the same reader is reused across
multiple documents, the cached fieldReader from a previous document access will be
reused, which is the intended behavior — but it's worth verifying that
getSequentialStoredFieldsReader() returns a reader that is safe to reuse across
multiple document() calls on the same leaf.

server/src/main/java/org/opensearch/search/lookup/SourceLookup.java [145-150]

 if (this.reader != context.reader()) {
     this.reader = context.reader();
     // Lazily initialize fieldReader in loadSourceIfNeeded() to avoid
     // unnecessary work when _source is never accessed.
+    // Note: fieldReader is intentionally reused across docId changes within
+    // the same reader, as SequentialStoredFieldsReader is stateful and
+    // designed to be reused for sequential access within a segment.
     this.fieldReader = null;
 }
Suggestion importance[1-10]: 1

__

Why: This suggestion only adds a comment to the existing code without any functional change — the existing_code and improved_code are functionally identical. It's a documentation-only suggestion with minimal impact.

Low

Previous suggestions

Suggestions up to commit 7c09502
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null reader before lazy init

The getSequentialStoredFieldsReader() call can throw an IOException, but it is not
handled here. In the original code, this was wrapped in a try-catch. Since
loadSourceIfNeeded() already declares throws IOException, the exception will
propagate correctly, but getSequentialStoredFieldsReader() specifically may need
explicit handling or documentation to ensure callers are aware.
Additionally, if
reader is null (e.g., setSegmentAndDocument was never called), this block will throw
a NullPointerException. A null-check or guard should be added.

server/src/main/java/org/opensearch/search/lookup/SourceLookup.java [107-113]

 if (fieldReader == null) {
+    if (reader == null) {
+        throw new IllegalStateException("setSegmentAndDocument must be called before loadSourceIfNeeded");
+    }
     if (reader instanceof SequentialStoredFieldsLeafReader lf) {
         fieldReader = lf.getSequentialStoredFieldsReader()::document;
     } else {
         fieldReader = reader.storedFields()::document;
     }
 }
Suggestion importance[1-10]: 3

__

Why: While the null check suggestion is technically valid, in practice reader cannot be null here because loadSourceIfNeeded() is only called after setSegmentAndDocument() which sets reader. The getSequentialStoredFieldsReader() IOException concern is already handled since the method declares throws IOException. The suggestion adds defensive code that's unlikely to be needed in practice.

Low
General
Strengthen caching assertion in test

This test verifies that the source is cached when called with the same reader and
document. However, it does not verify that fieldReader is only initialized once
(i.e., that the lazy init is not re-triggered). Consider also asserting that the
fieldReader is not re-created on repeated calls to ensure the lazy initialization is
truly idempotent and not causing resource leaks (e.g., multiple StoredFieldsReader
instances opened).

server/src/test/java/org/opensearch/search/lookup/SourceLookupTests.java [43-45]

-// Same reader+doc — source should remain cached
+// Same reader+doc — source should remain cached and fieldReader not re-initialized
 lookup.setSegmentAndDocument(ctx, 0);
-assertSame(source, lookup.loadSourceIfNeeded());
+Map<String, Object> cachedSource = lookup.loadSourceIfNeeded();
+assertSame(source, cachedSource);
+// Call again to ensure no additional reader is opened
+assertSame(cachedSource, lookup.loadSourceIfNeeded());
Suggestion importance[1-10]: 2

__

Why: The improved code only adds an extra loadSourceIfNeeded() call but doesn't actually verify that fieldReader is initialized only once (since fieldReader is now private). The suggestion doesn't meaningfully improve the test coverage beyond what's already there.

Low
Suggestions up to commit 7c09502
CategorySuggestion                                                                                                                                    Impact
General
Add assertion for lazy fieldReader reset behavior

This test verifies that switching readers returns the correct source, but it does
not explicitly verify that fieldReader is reset to null after setSegmentAndDocument
is called with a new reader (the core behavior being tested). Consider adding a test
that accesses fieldReader via reflection or a package-private accessor to assert it
is null before loadSourceIfNeeded() is called, ensuring the lazy initialization
contract is enforced.

server/src/test/java/org/opensearch/search/lookup/SourceLookupTests.java [67-84]

 public void testSetSegmentAndDocumentWithNewReaderDefersFieldReader() throws IOException {
     try (Directory dir1 = newDirectory(); Directory dir2 = newDirectory()) {
         indexSourceDoc(dir1, "{\"a\":\"1\"}");
         indexSourceDoc(dir2, "{\"b\":\"2\"}");
         try (DirectoryReader reader1 = DirectoryReader.open(dir1); DirectoryReader reader2 = DirectoryReader.open(dir2)) {
             LeafReaderContext ctx1 = reader1.leaves().get(0);
             LeafReaderContext ctx2 = reader2.leaves().get(0);
 
             SourceLookup lookup = new SourceLookup();
             lookup.setSegmentAndDocument(ctx1, 0);
             assertEquals("1", lookup.loadSourceIfNeeded().get("a"));
 
-            // Switch to a different reader — should reset fieldReader and source
+            // Switch to a different reader — fieldReader should be null (deferred) and source reset
             lookup.setSegmentAndDocument(ctx2, 0);
+            assertNull(lookup.fieldReader); // verify lazy reset
             assertEquals("2", lookup.loadSourceIfNeeded().get("b"));
+            assertNotNull(lookup.fieldReader); // verify initialized after access
         }
     }
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion proposes accessing fieldReader directly (e.g., lookup.fieldReader) in the test, but fieldReader is now private in the PR, making this approach invalid without reflection or a package-private accessor. The improved code would not compile as written.

Low
Ensure consistent IOException handling during lazy init

The getSequentialStoredFieldsReader() call can throw an IOException, but it is not
handled here. In the original code, this was wrapped in a try-catch. Since
loadSourceIfNeeded() already declares throws IOException, the exception will
propagate correctly, but reader.storedFields()::document also throws IOException and
is already handled by the outer try-catch — however, if
getSequentialStoredFieldsReader() throws, the fieldReader will remain null and the
next call will retry, which may be acceptable. Ensure the behavior is intentional
and consistent.

server/src/main/java/org/opensearch/search/lookup/SourceLookup.java [107-113]

+if (fieldReader == null) {
+    if (reader instanceof SequentialStoredFieldsLeafReader lf) {
+        fieldReader = lf.getSequentialStoredFieldsReader()::document;
+    } else {
+        fieldReader = reader.storedFields()::document;
+    }
+}
 
-
Suggestion importance[1-10]: 2

__

Why: The existing_code and improved_code are identical, meaning no actual change is proposed. The suggestion asks the user to verify behavior rather than fix a concrete bug. Since loadSourceIfNeeded() already declares throws IOException, any exception from getSequentialStoredFieldsReader() will propagate correctly through the existing outer try-catch.

Low
Suggestions up to commit 641f458
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null reader during lazy init

When reader is null (e.g., if setSegmentAndDocument was never called), accessing
reader here will throw a NullPointerException. A guard check should be added to
handle this case gracefully, either by throwing a meaningful exception or returning
an empty map.

server/src/main/java/org/opensearch/search/lookup/SourceLookup.java [107-113]

 if (fieldReader == null) {
+    if (reader == null) {
+        return emptyMap();
+    }
     if (reader instanceof SequentialStoredFieldsLeafReader lf) {
         fieldReader = lf.getSequentialStoredFieldsReader()::document;
     } else {
         fieldReader = reader.storedFields()::document;
     }
 }
Suggestion importance[1-10]: 3

__

Why: While technically valid, reader being null at this point would indicate a programming error (calling loadSourceIfNeeded() without first calling setSegmentAndDocument()). Silently returning an empty map could mask bugs. The existing code already has this implicit contract, and the suggestion adds marginal safety at the cost of hiding misuse.

Low
General
Document sequential access assumption for correctness

The fieldReader is only reset when the reader changes, but for
SequentialStoredFieldsLeafReader, getSequentialStoredFieldsReader() may return a
stateful reader tied to a specific traversal. If the same reader is reused but
documents are accessed out of order, the sequential reader may not behave correctly.
Consider documenting this assumption or validating that sequential access is
guaranteed within a single reader context.

server/src/main/java/org/opensearch/search/lookup/SourceLookup.java [145-150]

 if (this.reader != context.reader()) {
     this.reader = context.reader();
     // Lazily initialize fieldReader in loadSourceIfNeeded() to avoid
     // unnecessary work when _source is never accessed.
+    // Note: fieldReader is reset per-reader (not per-doc) because
+    // SequentialStoredFieldsLeafReader provides a stateful sequential reader
+    // that is valid for the lifetime of the segment. Documents within a
+    // segment are always accessed in order during fetch.
     this.fieldReader = null;
 }
Suggestion importance[1-10]: 1

__

Why: This suggestion only adds comments to document an existing assumption without changing any logic. The improved_code is functionally identical to the existing_code, making this a documentation-only change with minimal impact.

Low
Suggestions up to commit 641f458
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null reader during lazy init

When reader is null (i.e., setSegmentAndDocument was never called), the lazy
initialization block will throw a NullPointerException. A null-check or guard should
be added to produce a clearer error message in that case.

server/src/main/java/org/opensearch/search/lookup/SourceLookup.java [107-113]

 if (fieldReader == null) {
+    if (reader == null) {
+        throw new IllegalStateException("setSegmentAndDocument must be called before loadSourceIfNeeded");
+    }
     if (reader instanceof SequentialStoredFieldsLeafReader lf) {
         fieldReader = lf.getSequentialStoredFieldsReader()::document;
     } else {
         fieldReader = reader.storedFields()::document;
     }
 }
Suggestion importance[1-10]: 4

__

Why: While technically valid, reader being null when loadSourceIfNeeded() is called without prior setSegmentAndDocument() is an edge case that would already produce a clear NPE. Adding an explicit guard improves error clarity but has low practical impact since this is an internal class with controlled usage patterns.

Low
General
Clarify fieldReader reuse across document changes

The fieldReader is only reset when reader changes, but not when the reader stays the
same and only docId changes. For a SequentialStoredFieldsLeafReader, the sequential
reader obtained via getSequentialStoredFieldsReader() may maintain internal state
tied to the document position. Reusing the same fieldReader instance across
different docId values within the same segment should be verified as safe; if not,
the fieldReader should also be reset on docId change.

server/src/main/java/org/opensearch/search/lookup/SourceLookup.java [145-150]

 if (this.reader != context.reader()) {
     this.reader = context.reader();
     // Lazily initialize fieldReader in loadSourceIfNeeded() to avoid
     // unnecessary work when _source is never accessed.
     this.fieldReader = null;
 }
+// Note: fieldReader is intentionally NOT reset on docId-only changes,
+// as the sequential reader is designed to be reused across docs in the same segment.
Suggestion importance[1-10]: 2

__

Why: The suggestion asks to verify behavior rather than fix a concrete bug, and the improved_code is essentially the same as existing_code with only a comment added. The SequentialStoredFieldsLeafReader is designed to be reused across documents in the same segment, so this is not an actual issue.

Low
Suggestions up to commit 641f458
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null reader before lazy init

When reader is null (e.g., if setSegmentAndDocument was never called), accessing
reader here will throw a NullPointerException. A guard check should be added to
handle this case gracefully, either by throwing a meaningful exception or returning
an empty map.

server/src/main/java/org/opensearch/search/lookup/SourceLookup.java [107-113]

 if (fieldReader == null) {
+    if (reader == null) {
+        return emptyMap();
+    }
     if (reader instanceof SequentialStoredFieldsLeafReader lf) {
         fieldReader = lf.getSequentialStoredFieldsReader()::document;
     } else {
         fieldReader = reader.storedFields()::document;
     }
 }
Suggestion importance[1-10]: 3

__

Why: While technically valid, reader being null when loadSourceIfNeeded() is called would indicate a programming error (caller never called setSegmentAndDocument). The existing code already has reader as a field initialized to null, but the original code also had this same potential issue. Returning emptyMap() silently would mask bugs rather than expose them. This is a low-impact edge case that likely never occurs in practice.

Low
General
Reset sequential field reader on doc change

The fieldReader is only reset when reader changes, but not when the reader stays the
same and only docId changes. For a SequentialStoredFieldsLeafReader, the sequential
reader obtained via getSequentialStoredFieldsReader() may be stateful and tied to a
specific document position. Reusing the same fieldReader across different docId
values within the same segment should be verified as safe; if not, the fieldReader
should also be reset on docId change.

server/src/main/java/org/opensearch/search/lookup/SourceLookup.java [145-150]

 if (this.reader != context.reader()) {
     this.reader = context.reader();
     // Lazily initialize fieldReader in loadSourceIfNeeded() to avoid
     // unnecessary work when _source is never accessed.
     this.fieldReader = null;
+} else if (this.docId != docId && reader instanceof SequentialStoredFieldsLeafReader) {
+    // Reset fieldReader for sequential reader when docId changes within the same segment
+    this.fieldReader = null;
 }
Suggestion importance[1-10]: 2

__

Why: The SequentialStoredFieldsLeafReader's sequential reader is designed to be reused across multiple document reads within the same segment — that's the whole point of the optimization. Resetting fieldReader on every docId change would defeat the purpose of the lazy initialization and sequential access optimization introduced by this PR.

Low

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 33e3aeb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@bowenlan-amzn bowenlan-amzn force-pushed the lazy-get-merge-instance branch from 33e3aeb to 7536dd6 Compare March 10, 2026 21:21
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 7536dd6

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 7536dd6: SUCCESS

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.22%. Comparing base (fb5d661) to head (c575d0a).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20827      +/-   ##
============================================
- Coverage     73.24%   73.22%   -0.02%     
+ Complexity    72510    72474      -36     
============================================
  Files          5819     5819              
  Lines        331373   331372       -1     
  Branches      47882    47883       +1     
============================================
- Hits         242707   242663      -44     
- Misses        69201    69202       +1     
- Partials      19465    19507      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added bug Something isn't working lucene Search:Performance labels Mar 19, 2026
@bowenlan-amzn bowenlan-amzn changed the title Lazy invoking get merge instance Lazy init stored field reader in SourceLookup Mar 19, 2026
@bowenlan-amzn bowenlan-amzn force-pushed the lazy-get-merge-instance branch from 7536dd6 to 9be78fa Compare March 19, 2026 23:38
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 9be78fa

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 5dc31b9: SUCCESS

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a5bfa82

@bowenlan-amzn bowenlan-amzn force-pushed the lazy-get-merge-instance branch 2 times, most recently from 33066f4 to 641f458 Compare March 23, 2026 04:38
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 641f458

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 641f458: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 641f458

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 641f458: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 641f458

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 641f458: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 7c09502

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 7c09502

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

revert vcs change

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
@bowenlan-amzn bowenlan-amzn force-pushed the lazy-get-merge-instance branch from 7c09502 to c575d0a Compare March 23, 2026 19:12
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c575d0a

@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for c575d0a: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@rishabhmaurya rishabhmaurya merged commit 3d76eae into opensearch-project:main Mar 23, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lucene Search:Performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] madvise from stored field query path could cause mmap lock contention on kernel 5.10

3 participants