Skip to content

Updated MMapDirectory to use ReadAdviseByContext rather than default readadvise of Lucene#21031

Merged
andrross merged 1 commit intoopensearch-project:mainfrom
navneet1v:main
Mar 31, 2026
Merged

Updated MMapDirectory to use ReadAdviseByContext rather than default readadvise of Lucene#21031
andrross merged 1 commit intoopensearch-project:mainfrom
navneet1v:main

Conversation

@navneet1v
Copy link
Copy Markdown
Contributor

Description

Updated MMapDirectory to use ReadAdviseByContext rather than default readadvise of Lucene.

More details can be found here: #21012

Related Issues

Resolves #21012

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for c750f0c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@navneet1v
Copy link
Copy Markdown
Contributor Author

Tagging @andrross , @msfroh , @gbbafna , @mch2 to review the PR.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 7b79e38: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 68fcab2: SUCCESS

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.16%. Comparing base (04ac6e3) to head (fb56484).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #21031      +/-   ##
============================================
- Coverage     73.24%   73.16%   -0.09%     
+ Complexity    72811    72761      -50     
============================================
  Files          5871     5871              
  Lines        332666   332669       +3     
  Branches      48014    48014              
============================================
- Hits         243660   243381     -279     
- Misses        69451    69802     +351     
+ Partials      19555    19486      -69     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 2b55691: SUCCESS

Copy link
Copy Markdown
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable, but are we concerned at all about potential regressions with using MADV_RANDOM for certain files if MGLRU is enabled? That was the whole reason Lucene reverted its change to the default behavior.

@navneet1v
Copy link
Copy Markdown
Contributor Author

This looks reasonable, but are we concerned at all about potential regressions with using MADV_RANDOM for certain files if MGLRU is enabled? That was the whole reason Lucene reverted its change to the default behavior.

By not having this change, we are seeing the regression during search. One thing I can think of here is, if MGLRU is enabled on the Kernel we fallback to NORMAL readadvise, otherwise we can use ADVISE by CONTEXT.

@andrross WDYT?

@andrross
Copy link
Copy Markdown
Member

One thing I can think of here is, if MGLRU is enabled on the Kernel we fallback to NORMAL readadvise, otherwise we can use ADVISE by CONTEXT.

@navneet1v I honestly don't know. Do we have benchmarks that can tell us the optimal configuration with and without MGLRU?

@navneet1v
Copy link
Copy Markdown
Contributor Author

One thing I can think of here is, if MGLRU is enabled on the Kernel we fallback to NORMAL readadvise, otherwise we can use ADVISE by CONTEXT.

@navneet1v I honestly don't know. Do we have benchmarks that can tell us the optimal configuration with and without MGLRU?

One thing I can think of here is, if MGLRU is enabled on the Kernel we fallback to NORMAL readadvise, otherwise we can use ADVISE by CONTEXT.

@navneet1v I honestly don't know. Do we have benchmarks that can tell us the optimal configuration with and without MGLRU?

@andrross in 2.19 version of OpenSearch we are using Advise by Context. This change is just making those things exactly same. Plus the regression which was observed in Elastic was during merge and during merge we are already making the read advise sequential

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

PR Reviewer Guide 🔍

(Review updated until commit fb56484)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Reflection Fragility

The test uses reflection to access the private readAdvice field of MMapDirectory. This approach is fragile and may break if the Lucene internal implementation changes the field name or type. Consider verifying behavior through observable side effects (e.g., checking the actual ReadAdvice returned for known IOContexts) rather than accessing private internals, or at minimum add a clear comment about the Lucene version dependency.

private Object getReadAdviceField(MMapDirectory mMapDirectory) throws Exception {
    Field readAdviceField = MMapDirectory.class.getDeclaredField("readAdvice");
    readAdviceField.setAccessible(true);
    return readAdviceField.get(mMapDirectory);
}
Inconsistent Test Assertion

In the merge IOContext test case, ADVISE_BY_CONTEXT.apply is called with IOContext.merge(mergeInfo) but readAdvice.apply is called with IOContext.merge(mergeInfo).withHints(). These are different IOContext instances and may not produce equivalent results, making the assertion potentially misleading or incorrect.

MergeInfo mergeInfo = new MergeInfo(100, 100L, false, 1);
assertEquals(
    "Advise By context is not set",
    MMapDirectory.ADVISE_BY_CONTEXT.apply("test.vec", IOContext.merge(mergeInfo)),
    readAdvice.apply("test.vec", IOContext.merge(mergeInfo).withHints())
);
HYBRIDFS Comment Mismatch

The HYBRIDFS case still has the comment // Use Lucene defaults but now explicitly overrides the read advice to ADVISE_BY_CONTEXT. The comment is misleading and should be updated to reflect the new behavior.

// Use Lucene defaults
final FSDirectory primaryDirectory = FSDirectory.open(location, lockFactory);
final Set<String> nioExtensions = new HashSet<>(indexSettings.getValue(IndexModule.INDEX_STORE_HYBRID_NIO_EXTENSIONS));
if (primaryDirectory instanceof MMapDirectory mMapDirectory) {
    // Setting the read advise by context: REF: https://github.com/opensearch-project/OpenSearch/issues/21012
    mMapDirectory.setReadAdvice(MMapDirectory.ADVISE_BY_CONTEXT);
    return new HybridDirectory(lockFactory, setPreload(mMapDirectory, preLoadExtensions), nioExtensions);
Missing FS Case Test

The FS type case on 64-bit systems resolves to HYBRIDFS (which is now tested), but there is no explicit call to mMapDirectoryHasReadAdviceByContext in the FS branch of the test switch statement. If the FS type resolves to an MMapDirectory-backed directory, the read advice should also be verified there.

case FS:
    if (Constants.JRE_IS_64BIT) {
        assertTrue(FsDirectoryFactory.isHybridFs(directory));
    } else {
        assertTrue(directory.toString(), directory instanceof NIOFSDirectory);
    }
    break;

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

PR Code Suggestions ✨

Latest suggestions up to fb56484

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix inconsistent IOContext in test assertion

The two sides of the assertEquals use different IOContext instances: the expected
side uses IOContext.merge(mergeInfo) while the actual side uses
IOContext.merge(mergeInfo).withHints(). This inconsistency means the test may not
accurately verify that ADVISE_BY_CONTEXT is set, since the contexts differ. Both
sides should use the same IOContext instance.

server/src/test/java/org/opensearch/index/store/FsDirectoryFactoryTests.java [228-233]

 MergeInfo mergeInfo = new MergeInfo(100, 100L, false, 1);
+IOContext mergeContext = IOContext.merge(mergeInfo);
 assertEquals(
     "Advise By context is not set",
-    MMapDirectory.ADVISE_BY_CONTEXT.apply("test.vec", IOContext.merge(mergeInfo)),
-    readAdvice.apply("test.vec", IOContext.merge(mergeInfo).withHints())
+    MMapDirectory.ADVISE_BY_CONTEXT.apply("test.vec", mergeContext),
+    readAdvice.apply("test.vec", mergeContext)
 );
Suggestion importance[1-10]: 7

__

Why: The expected and actual sides of the assertEquals use different IOContext instances (IOContext.merge(mergeInfo) vs IOContext.merge(mergeInfo).withHints()), which could cause the test to pass or fail for the wrong reasons. Using a consistent mergeContext variable on both sides makes the test more accurate and reliable.

Medium
General
Warn when read advice cannot be applied

If primaryDirectory is not an MMapDirectory (the else branch), the opened
FSDirectory is returned directly without being closed if an exception occurs later,
but more critically, the preLoadExtensions setting is silently ignored for the
non-MMap case. Consider logging a warning or applying equivalent configuration when
the primary directory is not an MMapDirectory.

server/src/main/java/org/opensearch/index/store/FsDirectoryFactory.java [100-108]

 final FSDirectory primaryDirectory = FSDirectory.open(location, lockFactory);
 final Set<String> nioExtensions = new HashSet<>(indexSettings.getValue(IndexModule.INDEX_STORE_HYBRID_NIO_EXTENSIONS));
 if (primaryDirectory instanceof MMapDirectory mMapDirectory) {
     // Setting the read advise by context: REF: https://github.com/opensearch-project/OpenSearch/issues/21012
     mMapDirectory.setReadAdvice(MMapDirectory.ADVISE_BY_CONTEXT);
     return new HybridDirectory(lockFactory, setPreload(mMapDirectory, preLoadExtensions), nioExtensions);
 } else {
+    logger.warn("Primary directory is not MMapDirectory ({}), falling back without preload or read advice settings", primaryDirectory.getClass().getSimpleName());
     return primaryDirectory;
 }
Suggestion importance[1-10]: 3

__

Why: The else branch behavior (ignoring preLoadExtensions) existed before this PR and is not introduced by it. This suggestion addresses a pre-existing concern unrelated to the PR's changes, and the improved_code only adds a warning log without fixing the actual issue of ignored settings.

Low
Document Lucene version dependency for reflection-based test

The test relies on reflection to access a private field named readAdvice in
MMapDirectory, which is a Lucene internal implementation detail. If the field is
renamed or its type changes in a future Lucene version, this test will fail with a
cryptic RuntimeException. Consider adding a more descriptive assertion message or a
comment noting the specific Lucene version this was verified against, to make future
maintenance easier.

server/src/test/java/org/opensearch/index/store/FsDirectoryFactoryTests.java [207-214]

-@SuppressForbidden(reason = "Need to check the readAdvise as there is no getter on read advise")
+@SuppressForbidden(reason = "Need to check the readAdvise as there is no getter on read advise; verified against Lucene 9.x MMapDirectory internals")
 private void mMapDirectoryHasReadAdviceByContext(MMapDirectory mapDirectory) {
     try {
         @SuppressWarnings("unchecked")
         BiFunction<String, IOContext, Optional<ReadAdvice>> readAdvice = (BiFunction<
             String,
             IOContext,
             Optional<ReadAdvice>>) getReadAdviceField(mapDirectory);
Suggestion importance[1-10]: 1

__

Why: This suggestion only modifies a comment/annotation string and does not improve code correctness or functionality. The existing_code and improved_code differ only in the @SuppressForbidden reason string, which is a documentation-only change with minimal impact.

Low

Previous suggestions

Suggestions up to commit 5236a7b
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix inconsistent IOContext in assertion comparison

The two arguments to assertEquals use different IOContext instances —
IOContext.merge(mergeInfo) vs IOContext.merge(mergeInfo).withHints(). This means the
expected and actual values are computed from different contexts, making the
assertion potentially incorrect. Both sides should use the same IOContext instance
to ensure a valid comparison.

server/src/test/java/org/opensearch/index/store/FsDirectoryFactoryTests.java [229-234]

 MergeInfo mergeInfo = new MergeInfo(100, 100L, false, 1);
+IOContext mergeContext = IOContext.merge(mergeInfo);
 assertEquals(
     "Advise By context is not set",
-    MMapDirectory.ADVISE_BY_CONTEXT.apply("test.vec", IOContext.merge(mergeInfo)),
-    readAdvice.apply("test.vec", IOContext.merge(mergeInfo).withHints())
+    MMapDirectory.ADVISE_BY_CONTEXT.apply("test.vec", mergeContext),
+    readAdvice.apply("test.vec", mergeContext)
 );
Suggestion importance[1-10]: 7

__

Why: The assertion compares ADVISE_BY_CONTEXT applied to IOContext.merge(mergeInfo) against readAdvice applied to IOContext.merge(mergeInfo).withHints(), which are different contexts. This could make the test pass or fail incorrectly, and using a consistent IOContext instance would make the test more reliable and accurate.

Medium
Add type check before casting delegate directory

The getDelegate() method may not return an MMapDirectory — it could return any
FSDirectory. If it returns a non-MMapDirectory instance, passing it to
mMapDirectoryHasReadAdviceByContext which expects MMapDirectory would cause a
ClassCastException. Add a type check or assertion before casting.

server/src/test/java/org/opensearch/index/store/FsDirectoryFactoryTests.java [184]

-mMapDirectoryHasReadAdviceByContext(((FsDirectoryFactory.HybridDirectory) directory).getDelegate());
+Directory delegate = ((FsDirectoryFactory.HybridDirectory) directory).getDelegate();
+assertTrue("Expected delegate to be MMapDirectory", delegate instanceof MMapDirectory);
+mMapDirectoryHasReadAdviceByContext((MMapDirectory) delegate);
Suggestion importance[1-10]: 5

__

Why: The getDelegate() could theoretically return a non-MMapDirectory on non-64-bit systems, causing a ClassCastException. Adding an explicit instanceof check and assertion would make the test failure message clearer and prevent unexpected exceptions.

Low
Ensure directory is closed on construction failure

If primaryDirectory is not an MMapDirectory (the else branch), the opened
FSDirectory is returned directly without closing it, but the caller may not close it
if an exception occurs elsewhere. More critically, when primaryDirectory is an
MMapDirectory, the directory is returned wrapped but the original reference is also
held — however the real issue is that if setPreload or HybridDirectory construction
throws, mMapDirectory (which is primaryDirectory) won't be closed. Consider wrapping
the body in a try-with-resources or ensuring cleanup on failure.

server/src/main/java/org/opensearch/index/store/FsDirectoryFactory.java [100-108]

 final FSDirectory primaryDirectory = FSDirectory.open(location, lockFactory);
 final Set<String> nioExtensions = new HashSet<>(indexSettings.getValue(IndexModule.INDEX_STORE_HYBRID_NIO_EXTENSIONS));
 if (primaryDirectory instanceof MMapDirectory mMapDirectory) {
-    // Setting the read advise by context: REF: https://github.com/opensearch-project/OpenSearch/issues/21012
-    mMapDirectory.setReadAdvice(MMapDirectory.ADVISE_BY_CONTEXT);
-    return new HybridDirectory(lockFactory, setPreload(mMapDirectory, preLoadExtensions), nioExtensions);
+    try {
+        // Setting the read advise by context: REF: https://github.com/opensearch-project/OpenSearch/issues/21012
+        mMapDirectory.setReadAdvice(MMapDirectory.ADVISE_BY_CONTEXT);
+        return new HybridDirectory(lockFactory, setPreload(mMapDirectory, preLoadExtensions), nioExtensions);
+    } catch (Exception e) {
+        mMapDirectory.close();
+        throw e;
+    }
 } else {
     return primaryDirectory;
 }
Suggestion importance[1-10]: 4

__

Why: This is a valid defensive coding concern about resource cleanup on failure, but this pattern existed before the PR and the PR only adds setReadAdvice which doesn't throw checked exceptions. The suggestion is somewhat valid but addresses a pre-existing issue rather than something introduced by this PR.

Low
Suggestions up to commit d8edef0
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix mismatched IOContext in equality assertion

The two arguments passed to assertEquals use different IOContext instances — one
includes FileDataHint.KNN_VECTORS and the other calls withHints() with no arguments.
This means the test is comparing results from different inputs, making it an invalid
equality check that may produce misleading results. Both calls should use the same
IOContext to correctly verify that readAdvice behaves identically to
ADVISE_BY_CONTEXT.

server/src/test/java/org/opensearch/index/store/FsDirectoryFactoryTests.java [229-234]

 MergeInfo mergeInfo = new MergeInfo(100, 100L, false, 1);
+IOContext mergeContext = IOContext.merge(mergeInfo).withHints(FileDataHint.KNN_VECTORS);
 assertEquals(
     "Advise By context is not be set",
-    MMapDirectory.ADVISE_BY_CONTEXT.apply("test.vec", IOContext.merge(mergeInfo).withHints(FileDataHint.KNN_VECTORS)),
-    readAdvice.apply("test.vec", IOContext.merge(mergeInfo).withHints())
+    MMapDirectory.ADVISE_BY_CONTEXT.apply("test.vec", mergeContext),
+    readAdvice.apply("test.vec", mergeContext)
 );
Suggestion importance[1-10]: 8

__

Why: The test compares ADVISE_BY_CONTEXT.apply with FileDataHint.KNN_VECTORS hint against readAdvice.apply with no hints (withHints()), making the assertion compare results from different inputs. This is a genuine bug in the test that could produce false positives or misleading results.

Medium
General
Prevent resource leak on exception

If setReadAdvice throws an exception after primaryDirectory has been opened, the
directory resource will be leaked since it is not closed in the error path. Consider
wrapping the logic in a try-catch or try-with-resources to ensure primaryDirectory
is closed on failure.

server/src/main/java/org/opensearch/index/store/FsDirectoryFactory.java [102-108]

 if (primaryDirectory instanceof MMapDirectory mMapDirectory) {
-    // Setting the read advise by context: REF: https://github.com/opensearch-project/OpenSearch/issues/21012
-    mMapDirectory.setReadAdvice(MMapDirectory.ADVISE_BY_CONTEXT);
-    return new HybridDirectory(lockFactory, setPreload(mMapDirectory, preLoadExtensions), nioExtensions);
+    try {
+        // Setting the read advise by context: REF: https://github.com/opensearch-project/OpenSearch/issues/21012
+        mMapDirectory.setReadAdvice(MMapDirectory.ADVISE_BY_CONTEXT);
+        return new HybridDirectory(lockFactory, setPreload(mMapDirectory, preLoadExtensions), nioExtensions);
+    } catch (Exception e) {
+        primaryDirectory.close();
+        throw e;
+    }
 } else {
     return primaryDirectory;
 }
Suggestion importance[1-10]: 5

__

Why: While setReadAdvice is unlikely to throw (it's a simple setter), the suggestion is technically valid for defensive programming. However, the risk is low since setReadAdvice doesn't perform I/O operations that would typically throw exceptions.

Low

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for d8edef0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 5236a7b

…readadvise of Lucene

Signed-off-by: Navneet Verma <navneev@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit fb56484

@navneet1v
Copy link
Copy Markdown
Contributor Author

@andrross can we trigger a benchmark run on this PR. In the meanwhile I doing more deep-dive on the issue provided by Lucene to see where and how ES faced issues. Because based on my deep-dive we change the IOContext during merge from RANDOM to SEQUENTIAL so we should face this issue in vectors.

Ref: https://github.com/apache/lucene/blob/726731f602a1b37430cd26995540c8a5d3735708/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java#L196-L201

@navneet1v
Copy link
Copy Markdown
Contributor Author

@andrross here is what I am able to trace out.

The issue that happened in Elastic Search was with Lucene version 9.12, with Kernel enabled MGLRU. Elastic was always using Random Read advise for their vector files. Ref: elastic/elasticsearch#124499

But with OpenSearch we should not face this issue because:

  1. When Opensearch is doing merges in lucene 10.x for vectors we switch the read advise to sequential, same capability was not present in Lucene 9.12.

I can try reproducing the same setup with OpenSearch 2.19(using Lucene 9.12) and OpenSearch 3.x where we override the read advice to showcase that we will not see regression when MGLRU is enabled, is the above pointers not enough. Please let me know your thoughts.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for fb56484: SUCCESS

Copy link
Copy Markdown
Member

@bowenlan-amzn bowenlan-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Curious where's the search regression coming from?
Is it somewhere you define IOContext with DataAccessHint.RANDOM, but without this change it got ignored?

Also I notice this comment in Lucene saying random hint works badly for vector search. Are you observing contrasting behavior?

@navneet1v
Copy link
Copy Markdown
Contributor Author

navneet1v commented Mar 31, 2026

Looks good.

Curious where's the search regression coming from? Is it somewhere you define IOContext with DataAccessHint.RANDOM, but without this change it got ignored?

Yes, so vectors are stored in .vec file which gets opened with IOContext as Random. But without this change that IOContext was getting ignored, which was causing higher page faults, during search.

@andrross andrross merged commit 311d711 into opensearch-project:main Mar 31, 2026
16 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 31, 2026
…1031)

Signed-off-by: Navneet Verma <navneev@amazon.com>
(cherry picked from commit 311d711)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 3.6 discuss Issues intended to help drive brainstorming and decision making lucene Search:Performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use Advise by Context to ensure IOContext is honored during file read in MMapDirectory

3 participants