[Bug] MOS optimistic search bug fix on nested Cagra index. by 0ctopus13prime · Pull Request #3155 · opensearch-project/k-NN

0ctopus13prime · 2026-03-11T19:43:51Z

Description

Problems

We have three problems.

RandomEntryPointsKnnSearchStrategy generates duplicated elements.
Second deep-dive search for nested index
2.1. The seed doc IDs from the second deep dive are not honored. Instead, random entry points are still used.
2.2. Nested index search uses TopKnnCollectorManager instead of DiversifyingNearestChildrenKnnCollectorManager.

Unique Random Entry Points Generation

The fix should be straightforward.

A naive implementation would use a Set to remove duplicates, but we can instead use a random sampling algorithm with O(1) memory.

Second Deep-Dive Search for Nested Index

The memory-optimized search uses an optimistic search strategy:

Run ANN search as usual.
Identify "promising segments" that are expected to contain vectors closer to the query vector.
Re-run ANN search on those segments using the doc IDs collected from the first phase as entry points.

This second phase is called the second deep-dive search.

The doc IDs collected during the first phase must be used as entry points. Otherwise, the purpose of the second deep dive is defeated.

However, for CAGRA, we currently use random entry points regardless.

if (hnsw instanceof FaissCagraHNSW cagraHNSW) {
    return new KnnCollector.Decorator(ordinalTranslatedKnnCollector) {
        @Override
        public KnnSearchStrategy getSearchStrategy() {
            return new RandomEntryPointsKnnSearchStrategy(
                cagraHNSW.getNumBaseLevelSearchEntryPoints(),
                cagraHNSW.getTotalNumberOfVectors(),
                knnCollector.getSearchStrategy()
            );
        }
    };
} else {

Fix

If the KnnCollector's search strategy is KnnSearchStrategy.Seeded, we must honor it and ensure that the search starts from the provided entry points.

Use DiversifyingNearestChildrenKnnCollectorManager

For a nested index, we track the best parent score, not individual child vectors.

For example:

Child docs 1, 2, 3 belong to parent doc 4.
The best child among 1, 2, 3 should represent the parent.
Collecting both 1 and 2 is not allowed.

This behavior is implemented by DiversifyingNearestChildrenKnnCollectorManager.

However, we currently use TopKnnCollectorManager for nested indices.

final ReentrantKnnCollectorManager reentrantCollectorManager =
    new ReentrantKnnCollectorManager(
        new TopKnnCollectorManager(k, indexSearcher),
        segmentOrdToResults,
        knnQuery.getVectorDataType() == VectorDataType.FLOAT
            ? knnQuery.getQueryVector()
            : knnQuery.getByteQueryVector(),
        knnQuery.getField()
    );

Fix: Select Collector Manager Based on Index Type

We should create the collector manager depending on whether the index is nested.

final KnnCollectorManager knnCollectorManager =
    knnQuery.getParentsFilter() == null
        ? new TopKnnCollectorManager(k, indexSearcher)
        : new DiversifyingNearestChildrenKnnCollectorManager(
              k,
              knnQuery.getParentsFilter(),
              indexSearcher
          );

final ReentrantKnnCollectorManager reentrantCollectorManager =
    new ReentrantKnnCollectorManager(
        knnCollectorManager,
        segmentOrdToResults,
        knnQuery.getVectorDataType() == VectorDataType.FLOAT
            ? knnQuery.getQueryVector()
            : knnQuery.getByteQueryVector(),
        knnQuery.getField()
    );

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

[O] New functionality includes testing.
[O] New functionality has been documented.
[O] API changes companion pull request created.
[O] Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>

0ctopus13prime · 2026-03-11T19:45:35Z

src/main/java/org/opensearch/knn/common/RobustUniqueRandomIterator.java

+ * iterations is less than 2, preserving O(1) amortized time.
+ * </p>
+ */
+public final class RobustUniqueRandomIterator {


LCG based random number generation is almost four times faster than HashSet based deduplication.
Also it does not require additional memory.

HashSet based micro performance

Min: 541 nano 50%: 1959 nano 90%: 3458 nano 95%: 3750 nano 99%: 5291 nano Max: 43125 nano

LCG based micro performance

Min: 125 nano 50%: 333 nano 90%: 458 nano 95%: 1417 nano 99%: 1625 nano Max: 12458 nano

Signed-off-by: Navneet Verma <navneev@amazon.com> Signed-off-by: Dooyong Kim <kdooyong@amazon.com>

navneet1v · 2026-03-11T20:51:06Z

src/main/java/org/opensearch/knn/index/query/nativelib/NativeEngineKnnVectorQuery.java

+            final KnnCollectorManager collectorManager = knnQuery.getParentsFilter() == null
+                ? new TopKnnCollectorManager(k, indexSearcher)
+                : new DiversifyingNearestChildrenKnnCollectorManager(k, knnQuery.getParentsFilter(), indexSearcher);


[Clarification needed]
I am little surprised that this is not just related to remote index build it can be for other search path too right?

Unfortunately, yes. Not only for the Cagra case this was affecting all nested case.

Thank you. Now it makes more sense to me. :)

codecov · 2026-03-11T21:17:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.48%. Comparing base (159e40d) to head (e1efe48).
⚠️ Report is 20 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3155      +/-   ##
============================================
+ Coverage     82.45%   82.48%   +0.02%     
- Complexity     3856     3867      +11     
============================================
  Files           417      418       +1     
  Lines         14423    14449      +26     
  Branches       1841     1847       +6     
============================================
+ Hits          11893    11918      +25     
- Misses         1771     1773       +2     
+ Partials        759      758       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

0ctopus13prime added 4 commits March 11, 2026 11:13

Fix bugs in searching cagra graph for nested index.

1da4f22

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>

Aaded UT for RobustUniqueRandomIterator.

35a5741

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>

Added unit tests for nested optimistic search.

976d5a8

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>

Added change log

46ea64e

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>

0ctopus13prime self-assigned this Mar 11, 2026

0ctopus13prime added the v3.6.0 label Mar 11, 2026

0ctopus13prime requested review from VijayanB, Vikasht34, heemin32, jmazanec15, junqiu-lei, luyuncheng, martin-gaievski, naveentatikonda, navneet1v, ryanbogan, shatejas and vamshin as code owners March 11, 2026 19:43

0ctopus13prime commented Mar 11, 2026

View reviewed changes

Fix remote index build errors and exceptions

e1efe48

Signed-off-by: Navneet Verma <navneev@amazon.com> Signed-off-by: Dooyong Kim <kdooyong@amazon.com>

0ctopus13prime force-pushed the nested-cagra-fix branch from ca92b6e to e1efe48 Compare March 11, 2026 20:00

0ctopus13prime mentioned this pull request Mar 11, 2026

Fixed the docker pull image name for remote index build image #3146

Closed

5 tasks

kotwanikunal mentioned this pull request Mar 11, 2026

Move test codecs to test sources #3151

Open

5 tasks

navneet1v reviewed Mar 11, 2026

View reviewed changes

navneet1v approved these changes Mar 11, 2026

View reviewed changes

Vikasht34 approved these changes Mar 12, 2026

View reviewed changes

Vikasht34 merged commit 612b549 into opensearch-project:main Mar 12, 2026
47 of 48 checks passed

0ctopus13prime added the bug Something isn't working label Mar 23, 2026

0ctopus13prime changed the title ~~MOS optimistic search bug fix on nested Cagra index.~~ [Bug] MOS optimistic search bug fix on nested Cagra index. Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] MOS optimistic search bug fix on nested Cagra index.#3155

[Bug] MOS optimistic search bug fix on nested Cagra index.#3155
Vikasht34 merged 5 commits intoopensearch-project:mainfrom
0ctopus13prime:nested-cagra-fix

0ctopus13prime commented Mar 11, 2026

Uh oh!

0ctopus13prime Mar 11, 2026 •

edited

Loading

Uh oh!

navneet1v Mar 11, 2026

Uh oh!

0ctopus13prime Mar 11, 2026

Uh oh!

navneet1v Mar 11, 2026

Uh oh!

codecov bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

0ctopus13prime commented Mar 11, 2026

Description

Problems

Unique Random Entry Points Generation

Second Deep-Dive Search for Nested Index

Fix

Use DiversifyingNearestChildrenKnnCollectorManager

Fix: Select Collector Manager Based on Index Type

Related Issues

Check List

Uh oh!

0ctopus13prime Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

navneet1v Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

0ctopus13prime Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

navneet1v Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

0ctopus13prime Mar 11, 2026 •

edited

Loading

codecov bot commented Mar 11, 2026 •

edited

Loading