[BugFix] Fix IOContext, Scorer, and max dimension for Lucene SQ 1 bit by naveentatikonda · Pull Request #3203 · opensearch-project/k-NN

naveentatikonda · 2026-03-24T02:15:03Z

Description

This PR includes the following bug fixes for Lucene SQ 1 bit (both HNSW and Flat):

Fix IOContext for .veq file to use SEQUENTIAL instead of RANDOM
Set max dimension to 16000 instead of 1024
Fix the scorer to use Bulk SIMD Scorer instead of Lucene scorer

Changes

Add KNN1040ScalarQuantizedVectorsFormat and KNN1040HnswScalarQuantizedVectorsFormat to use custom bulk SIMD scorer for the FLAT and HNSW index type
Wire KNN1040ScalarQuantizedVectorsFormat and KNN1040HnswScalarQuantizedVectorsFormat into KNN1040PerFieldKnnVectorsFormat for the FLAT resolver
Add unit tests for the new KNN1040ScalarQuantizedVectorsFormat and KNN1040HnswScalarQuantizedVectorsFormat and codec resolver
Rename Faiss104ScalarQuantizedVectorScorer.java => KNN1040ScalarQuantizedVectorScorer.java and Faiss1040ScalarQuantizedUtils.java => KNN1040ScalarQuantizedUtils.java
Add KNN1040ReadAdviceOverridingDirectory that wraps the segment directory and overrides IOContext to use DataAccessHint.SEQUENTIAL for .veq (quantized vector data) files. Lucene's default reader opens these with DataAccessHint.RANDOM, which disables OS read-ahead prefetching. Since quantized vector data is read sequentially during search, switching to sequential access improves I/O throughput. Raw vector data (.vec) retains random access for rescoring.
Override fieldsReader in KNN1040ScalarQuantizedVectorsFormat to wrap state.directory with KNN1040ReadAdviceOverridingDirectory before passing it to Lucene104ScalarQuantizedVectorsReader. The HNSW path (KNN1040HnswScalarQuantizedVectorsFormat) inherits this behavior via delegation.
Add unit tests for KNN1040ReadAdviceOverridingDirectory, KNN1040ScalarQuantizedVectorsFormatTests, and KNN1040HnswScalarQuantizedVectorsFormatTests to verify .veq files are opened with SEQUENTIAL hint

Related Issues

#3178

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

...pensearch/knn/index/codec/KNN1040Codec/PrefetchingLucene104ScalarQuantizedVectorsFormat.java

.../org/opensearch/knn/index/codec/scorer/PrefetchableLucene104ScalarQuantizedVectorScorer.java

codecov · 2026-03-24T03:48:02Z

Codecov Report

❌ Patch coverage is 98.43750% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.16%. Comparing base (6822da0) to head (41c2dc0).
⚠️ Report is 1 commits behind head on 3.6.

Files with missing lines	Patch %	Lines
...N1040Codec/KNN1040ScalarQuantizedVectorScorer.java	83.33%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##                3.6    #3203      +/-   ##
============================================
- Coverage     83.18%   83.16%   -0.03%     
- Complexity     4215     4234      +19     
============================================
  Files           453      456       +3     
  Lines         15455    15507      +52     
  Branches       1972     1974       +2     
============================================
+ Hits          12857    12897      +40     
- Misses         1805     1814       +9     
- Partials        793      796       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

.../org/opensearch/knn/index/codec/scorer/PrefetchableLucene104ScalarQuantizedVectorScorer.java

...opensearch/knn/index/codec/scorer/PrefetchableLucene104ScalarQuantizedVectorScorerTests.java

...pensearch/knn/index/codec/KNN1040Codec/PrefetchingLucene104ScalarQuantizedVectorsFormat.java

...ava/org/opensearch/knn/index/codec/KNN1040Codec/Faiss1040PrefetchSupportKnnVectorReader.java

...in/java/org/opensearch/knn/index/codec/KNN1040Codec/KNN1040ScalarQuantizedVectorsFormat.java

src/main/java/org/opensearch/knn/index/codec/KNN1040Codec/ScalarQuantizedFloatVectorValues.java

naveentatikonda · 2026-03-27T00:51:41Z

This is the test that is failing

[o.o.k.r.RecallTestsIT    ] [testRecall_when1bitScalarQuantizer_thenRecallAbove60percent] before test
][o.o.k.r.RecallTestsIT    ] [testRecall_when1bitScalarQuantizer_thenRecallAbove60percent] Recall value for SpaceType L2 = 0.8819001770019531
][o.o.k.r.RecallTestsIT    ] [testRecall_when1bitScalarQuantizer_thenRecallAbove60percent] Recall value for SpaceType COSINESIMIL = 0.5164001846313476
][o.o.k.r.RecallTestsIT    ] [testRecall_when1bitScalarQuantizer_thenRecallAbove60percent] after test
  2> REPRODUCE WITH: ./gradlew ':integTest' --tests 'org.opensearch.knn.recall.RecallTestsIT.testRecall_when1bitScalarQuantizer_thenRecallAbove60percent' -Dtests.seed=E6972E43A86BA1B9 -Dtests.security.manager=false -Dtests.locale=csw-CA -Dtests.timezone=Africa/Brazzaville -Druntime.java=25
  2> java.lang.AssertionError: expected:<1.0> but was:<0.5164001846313476>
        at __randomizedtesting.SeedInfo.seed([E6972E43A86BA1B9:2583729F1C9C374]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at org.junit.Assert.assertEquals(Assert.java:685)
        at org.opensearch.knn.recall.RecallTestsIT.assertRecall(RecallTestsIT.java:738)
        at org.opensearch.knn.recall.RecallTestsIT.testRecall_when1bitScalarQuantizer_thenRecallAbove60percent(RecallTestsIT.java:729)

Thanks @jack-hung-lgtm for adding this IT which helped to catch a bug in cosine similarity space type with Bulk SIMD. Will keep this PR on hold until the issue is resolved.

This logic was missed in our Bulk SIMD implementation of cosine similarity. We are working on resolving the issue. Thanks!

Signed-off-by: Naveen Tatikonda <navtat@amazon.com> Address Review Comments Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

0ctopus13prime · 2026-03-31T17:38:47Z

...ain/java/org/opensearch/knn/index/codec/KNN1040Codec/KNN1040ScalarQuantizedVectorScorer.java

+        // For cosine similarity, Lucene's OptimizedScalarQuantizer asserts the query is a unit vector.
+        // FAISS already normalizes the query, so we only normalize if it isn't already. But, we need to
+        // normalize the query vector when using Lucene engine.
+        KNN1040ScalarQuantizedUtils.normalizeIfNeeded(targetCopy, similarityFunction);


If we end up choosing normalizing query vector within flat vector scorer, then can we avoid checking whether it's an unit vector?
We can just normalize targetCopy, as it's idempotent.
Normalize(Normalize(Normalize(vector))) == Normalize(vector)

Also curious, why we are not normalizing query vector in QueryBuilder side likewise we're doing for other cosine index type?

src/main/java/org/opensearch/knn/index/codec/KNN1040Codec/KNN1040ScalarQuantizedUtils.java

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

navneet1v · 2026-03-31T18:26:08Z

...n/java/org/opensearch/knn/index/codec/KNN1040Codec/KNN1040ReadAdviceOverridingDirectory.java

+ * <p>This is used by both the Lucene SQ flat path ({@link KNN1040ScalarQuantizedVectorsFormat})
+ * and the Lucene SQ HNSW path ({@link KNN1040HnswScalarQuantizedVectorsFormat}).
+ */
+final class KNN1040ReadAdviceOverridingDirectory extends FilterDirectory {


This is not needed because in OpenSearch version > 3.1 and less than equal to 3.6 readAdvise is not honored

Ref: opensearch-project/OpenSearch#21012

0ctopus13prime · 2026-03-31T18:38:54Z

...n/java/org/opensearch/knn/index/codec/KNN1040Codec/KNN1040ReadAdviceOverridingDirectory.java

+import java.io.IOException;
+
+/**
+ * Directory wrapper that enforces {@link DataAccessHint#SEQUENTIAL} for {@code .veb} files


NIT : .veb -> .veq

naveentatikonda requested a review from heemin32 as a code owner March 24, 2026 02:15

naveentatikonda added the skip-changelog label Mar 24, 2026

naveentatikonda requested review from VijayanB and navneet1v as code owners March 24, 2026 02:15

naveentatikonda added the v3.6.0 label Mar 24, 2026

naveentatikonda requested review from 0ctopus13prime, Vikasht34, jmazanec15, junqiu-lei, luyuncheng, martin-gaievski, ryanbogan, shatejas and vamshin as code owners March 24, 2026 02:15

naveentatikonda force-pushed the prefetch_bbq_flat branch from 5a0e53f to a6aa0cb Compare March 24, 2026 02:29

Vikasht34 reviewed Mar 24, 2026

View reviewed changes

...pensearch/knn/index/codec/KNN1040Codec/PrefetchingLucene104ScalarQuantizedVectorsFormat.java Outdated Show resolved Hide resolved

Vikasht34 reviewed Mar 24, 2026

View reviewed changes

.../org/opensearch/knn/index/codec/scorer/PrefetchableLucene104ScalarQuantizedVectorScorer.java Outdated Show resolved Hide resolved

navneet1v reviewed Mar 24, 2026

View reviewed changes

.../org/opensearch/knn/index/codec/scorer/PrefetchableLucene104ScalarQuantizedVectorScorer.java Outdated Show resolved Hide resolved

navneet1v reviewed Mar 24, 2026

View reviewed changes

...opensearch/knn/index/codec/scorer/PrefetchableLucene104ScalarQuantizedVectorScorerTests.java Outdated Show resolved Hide resolved

VijayanB reviewed Mar 24, 2026

View reviewed changes

...pensearch/knn/index/codec/KNN1040Codec/PrefetchingLucene104ScalarQuantizedVectorsFormat.java Outdated Show resolved Hide resolved

naveentatikonda force-pushed the prefetch_bbq_flat branch from a6aa0cb to 7c38f8a Compare March 24, 2026 06:13

naveentatikonda changed the base branch from main to feature/faiss-bbq March 24, 2026 06:14

naveentatikonda force-pushed the prefetch_bbq_flat branch from 7c38f8a to 0bad74a Compare March 24, 2026 06:45

naveentatikonda marked this pull request as draft March 24, 2026 06:46

naveentatikonda force-pushed the feature/faiss-bbq branch from 11d6dea to 1c4ab75 Compare March 24, 2026 17:49

naveentatikonda force-pushed the prefetch_bbq_flat branch from 0bad74a to f210f85 Compare March 24, 2026 23:28

naveentatikonda changed the base branch from feature/faiss-bbq to main March 24, 2026 23:29

naveentatikonda force-pushed the prefetch_bbq_flat branch from 856a970 to 72d68c4 Compare March 25, 2026 00:02

naveentatikonda changed the title ~~Add Prefetch Optimization to Lucene SQ Flat 1 bit~~ Integrate Bulk SIMD Scorer to Lucene SQ Flat 1 bit Mar 25, 2026

navneet1v reviewed Mar 25, 2026

View reviewed changes

...ava/org/opensearch/knn/index/codec/KNN1040Codec/Faiss1040PrefetchSupportKnnVectorReader.java Outdated Show resolved Hide resolved

navneet1v reviewed Mar 25, 2026

View reviewed changes

...ava/org/opensearch/knn/index/codec/KNN1040Codec/Faiss1040PrefetchSupportKnnVectorReader.java Outdated Show resolved Hide resolved

0ctopus13prime reviewed Mar 25, 2026

View reviewed changes

naveentatikonda force-pushed the prefetch_bbq_flat branch from 52448ab to 3daeb46 Compare March 26, 2026 05:15

naveentatikonda changed the title ~~Integrate Bulk SIMD Scorer to Lucene SQ Flat 1 bit~~ Integrate Bulk SIMD Scorer to Lucene SQ 1 bit Mar 26, 2026

naveentatikonda force-pushed the prefetch_bbq_flat branch from 3daeb46 to 87eb1f3 Compare March 26, 2026 20:48

naveentatikonda added the backport 3.6 label Mar 30, 2026

naveentatikonda force-pushed the prefetch_bbq_flat branch 5 times, most recently from 3a8748b to fef69bc Compare March 31, 2026 02:58

naveentatikonda changed the title ~~Integrate Bulk SIMD Scorer to Lucene SQ 1 bit~~ [BugFix] Fix IOContext, Scorer, and max dimension for Lucene SQ 1 bit Mar 31, 2026

naveentatikonda force-pushed the prefetch_bbq_flat branch from fef69bc to ef7fc5e Compare March 31, 2026 17:28

naveentatikonda changed the base branch from main to 3.6 March 31, 2026 17:28

naveentatikonda added 4 commits March 31, 2026 10:29

Add SIMD scorer to Lucene SQ Flat

87540f5

Signed-off-by: Naveen Tatikonda <navtat@amazon.com> Address Review Comments Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

Refactoring changes

d5c7064

Signed-off-by: Naveen Tatikonda <navtat@amazon.com> Address Review Comments Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

Integrate Bulk SIMD scorer to Lucene SQ HNSW 1 bit

3b2b2f6

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

Change iocontext to sequantial for .veq

324030b

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

naveentatikonda force-pushed the prefetch_bbq_flat branch from ef7fc5e to 324030b Compare March 31, 2026 17:29

naveentatikonda added backport main and removed backport 3.6 labels Mar 31, 2026

naveentatikonda requested review from 0ctopus13prime and navneet1v March 31, 2026 17:34

0ctopus13prime reviewed Mar 31, 2026

View reviewed changes

src/main/java/org/opensearch/knn/index/codec/KNN1040Codec/KNN1040ScalarQuantizedUtils.java Outdated Show resolved Hide resolved

Address Review Comments

41c2dc0

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

navneet1v reviewed Mar 31, 2026

View reviewed changes

0ctopus13prime reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Fix IOContext, Scorer, and max dimension for Lucene SQ 1 bit #3203

[BugFix] Fix IOContext, Scorer, and max dimension for Lucene SQ 1 bit #3203
naveentatikonda wants to merge 5 commits intoopensearch-project:3.6from
naveentatikonda:prefetch_bbq_flat

naveentatikonda commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

naveentatikonda commented Mar 27, 2026

Uh oh!

0ctopus13prime Mar 31, 2026 •

edited

Loading

Uh oh!

0ctopus13prime Mar 31, 2026

Uh oh!

Uh oh!

navneet1v Mar 31, 2026

Uh oh!

navneet1v Mar 31, 2026

Uh oh!

0ctopus13prime Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

naveentatikonda commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Related Issues

Check List

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

naveentatikonda commented Mar 27, 2026

Uh oh!

0ctopus13prime Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0ctopus13prime Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

navneet1v Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

navneet1v Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

0ctopus13prime Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

naveentatikonda commented Mar 24, 2026 •

edited

Loading

codecov bot commented Mar 24, 2026 •

edited

Loading

0ctopus13prime Mar 31, 2026 •

edited

Loading