Skip to content

Fix KNN build due to Lucene 10.3 upgrade#2878

Merged
navneet1v merged 1 commit intoopensearch-project:mainfrom
navneet1v:main
Sep 22, 2025
Merged

Fix KNN build due to Lucene 10.3 upgrade#2878
navneet1v merged 1 commit intoopensearch-project:mainfrom
navneet1v:main

Conversation

@navneet1v
Copy link
Copy Markdown
Collaborator

@navneet1v navneet1v commented Sep 17, 2025

Description

This change allows us to fix the failing builds which was happening to Lucene 10.3 and other changes in core.

Changes done:

  1. Created a new codec KNN1030Codec class for Lucene 10.3 version.
  2. Fixed the interface for search for kNN queries to use AcceptedDocs rather than BitSets
  3. Fix and cleaned up the guava dependencies for grpc.

Known Issue

MacOS CI will fail and it is unrelated to this PR. Will raise another PR for the same. GH issue: #2876

Related Issues

#2888

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v navneet1v force-pushed the main branch 2 times, most recently from d0cca7e to f5242bf Compare September 18, 2025 01:09
@navneet1v navneet1v changed the title Fixing the macOS CI Fixing knn build due to guava Sep 18, 2025
@navneet1v
Copy link
Copy Markdown
Collaborator Author

Need to fix Lucene codec too since OpenSearch core is moved to 10.3 version of Lucene

@finnegancarroll
Copy link
Copy Markdown

I see dependency conflict fixed on this branch but ITs still seem to fail to load failure access at runtime when i run locally.

./gradlew ':integTest' --tests 'org.opensearch.knn.recall.RecallTestsIT'
...

=== Standard error of node `node{::integTest-0}` ===
»   ↓ last 40 non error or warning messages from /Users/carrofin/fdev/repos/k-NN/build/testclusters/integTest-0/logs/opensearch.stderr.log ↓
» WARNING: Using incubator modules: jdk.incubator.vector
»  WARNING: Unknown module: org.apache.arrow.memory.core specified to --add-opens
»  fatal error in thread [opensearch[integTest-0][search][T#1]], exiting
»  java.lang.NoClassDefFoundError: com/google/common/util/concurrent/internal/InternalFutureFailureAccess
»       at java.base/java.lang.ClassLoader.defineClass1(Native Method)
»       at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1027)
»       at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
»       at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
»       at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)
»       at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)

It's confusing to me this class is not loaded as I see it in the bundle:
#2885 (comment)

I think @karenyrx has an upstream change to resolve this:
opensearch-project/OpenSearch#19339

My only thought would be it's kind of awkward that grpc transport SPI is supplying failure access when KNN uses it natively for com.google.common.cache.LocalCache in org.opensearch.knn.index.memory.NativeMemoryCacheManager.

@navneet1v navneet1v changed the title Fixing knn build due to guava Fix KNN build due to Lucene 10.3 upgrade Sep 20, 2025
@navneet1v navneet1v force-pushed the main branch 6 times, most recently from 8e55b69 to 7013e9e Compare September 20, 2025 02:03
@navneet1v
Copy link
Copy Markdown
Collaborator Author

navneet1v commented Sep 20, 2025

waiting for this PR to be available. opensearch-project/OpenSearch#19339 via nightly snapshots.

Tested locally that with this change build is working.

cd OpenSearch
bash ./scripts/build.sh -v 3.3.0 -s true
cd ../k-NN
./gradlew integTest -PcustomDistributionUrl="/Users/navneev/workplace/OpenSearch/artifacts/dist/opensearch-min-3.3.0-SNAPSHOT-darwin-arm64.tar.gz"

@navneet1v navneet1v force-pushed the main branch 2 times, most recently from f1982a7 to 3de0da0 Compare September 21, 2025 01:46
@navneet1v navneet1v force-pushed the main branch 2 times, most recently from 795cfe2 to a5d591d Compare September 21, 2025 05:21
@navneet1v navneet1v marked this pull request as ready for review September 21, 2025 06:41
@navneet1v navneet1v added Maintenance Add support for new versions of OpenSearch/Dashboards from upstream and removed skip-changelog labels Sep 21, 2025
final Bits liveDocs = reader.getLiveDocs();
acceptDocs = AcceptDocs.fromLiveDocs(liveDocs, reader.maxDoc());
} else {
acceptDocs = new AcceptDocs() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too verbose to put in else statement, Please find a better way to write this.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate more your comment. What is the meaning of this is too verbose to put in else comment?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can factor this out as a small function in this class with a small comment. e.g. when cardinality is 0, it's indicating that there's no filtering which is combining user provided filter + liveDocs required..

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not clear from the comment. I can move this to a private function.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT : If cardinality == 0, then can we just pass null?

acceptDocs = AcceptDocs.fromLiveDocs(null, reader.maxDoc());

Copy link
Copy Markdown
Collaborator

@shatejas shatejas Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navneet1v just want to confirm we can't use fromIteratorSupplier right?

Another thought would be, should we create acceptdocs when we create filterbit set and pass it in?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced with Navneet, we can just pass liveDocs. If no filtering + no deleted docs, liveDocs will always be null.e

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navneet1v just want to confirm we can't use fromIteratorSupplier right?

Another thought would be, should we create acceptdocs when we create filterbit set and pass it in?

@shatejas to use the fromIteratorSupplier, we have to pass the iterator right from the base KNNWeight.class . I thought about doing it first and then realized it will break the whole code and will lead to changes across many files, we are already on 25 files in this PR. I think with new interface we should move our code to completely using AcceptedDocs. Will take that change in upcoming PRs.

So, my strategy was to first make this change get the build working, and as a next steps see if we can use fromIteratorSupplier. Based on my deep-dive I can see AcceptedDocs was just a fancy wrapper over bits and DISI. For HNSW the usage of bits remain same, its just when Lucene starts to do exact search it uses iterator() to get the doc Ids. https://github.com/apache/lucene/blob/586dc103361c99fefa2f67946855691a073e805c/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L234

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navneet1v I think I am aligned, as long as we do it as a fast follow up

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will create a GH issue on this. and take it next.

Copy link
Copy Markdown
Collaborator

@0ctopus13prime 0ctopus13prime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Added few nit-pickings, thank you for the hard work!

final Bits liveDocs = reader.getLiveDocs();
acceptDocs = AcceptDocs.fromLiveDocs(liveDocs, reader.maxDoc());
} else {
acceptDocs = new AcceptDocs() {
Copy link
Copy Markdown
Collaborator

@shatejas shatejas Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navneet1v just want to confirm we can't use fromIteratorSupplier right?

Another thought would be, should we create acceptdocs when we create filterbit set and pass it in?

Signed-off-by: Navneet Verma <navneev@amazon.com>
Copy link
Copy Markdown
Collaborator

@0ctopus13prime 0ctopus13prime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

@navneet1v navneet1v merged commit b285075 into opensearch-project:main Sep 22, 2025
36 of 38 checks passed
@github-project-automation github-project-automation bot moved this from 3.3.0 to ✅ Done in Vector Search RoadMap Sep 22, 2025
jingqimao77-spec pushed a commit to jingqimao77-spec/k-NN that referenced this pull request Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Maintenance Add support for new versions of OpenSearch/Dashboards from upstream

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

5 participants