Skip to content

[Bug] MOS optimistic search bug fix on nested Cagra index.#3155

Merged
Vikasht34 merged 5 commits intoopensearch-project:mainfrom
0ctopus13prime:nested-cagra-fix
Mar 12, 2026
Merged

[Bug] MOS optimistic search bug fix on nested Cagra index.#3155
Vikasht34 merged 5 commits intoopensearch-project:mainfrom
0ctopus13prime:nested-cagra-fix

Conversation

@0ctopus13prime
Copy link
Copy Markdown
Collaborator

Description

Problems

We have three problems.

  1. RandomEntryPointsKnnSearchStrategy generates duplicated elements.
  2. Second deep-dive search for nested index
    2.1. The seed doc IDs from the second deep dive are not honored. Instead, random entry points are still used.
    2.2. Nested index search uses TopKnnCollectorManager instead of DiversifyingNearestChildrenKnnCollectorManager.

Unique Random Entry Points Generation

The fix should be straightforward.

A naive implementation would use a Set to remove duplicates, but we can instead use a random sampling algorithm with O(1) memory.

Second Deep-Dive Search for Nested Index

The memory-optimized search uses an optimistic search strategy:

  1. Run ANN search as usual.
  2. Identify "promising segments" that are expected to contain vectors closer to the query vector.
  3. Re-run ANN search on those segments using the doc IDs collected from the first phase as entry points.

This second phase is called the second deep-dive search.

The doc IDs collected during the first phase must be used as entry points. Otherwise, the purpose of the second deep dive is defeated.

However, for CAGRA, we currently use random entry points regardless.

if (hnsw instanceof FaissCagraHNSW cagraHNSW) {
    return new KnnCollector.Decorator(ordinalTranslatedKnnCollector) {
        @Override
        public KnnSearchStrategy getSearchStrategy() {
            return new RandomEntryPointsKnnSearchStrategy(
                cagraHNSW.getNumBaseLevelSearchEntryPoints(),
                cagraHNSW.getTotalNumberOfVectors(),
                knnCollector.getSearchStrategy()
            );
        }
    };
} else {

Fix

If the KnnCollector's search strategy is KnnSearchStrategy.Seeded, we must honor it and ensure that the search starts from the provided entry points.

Use DiversifyingNearestChildrenKnnCollectorManager

For a nested index, we track the best parent score, not individual child vectors.

For example:

  • Child docs 1, 2, 3 belong to parent doc 4.
  • The best child among 1, 2, 3 should represent the parent.
  • Collecting both 1 and 2 is not allowed.

This behavior is implemented by DiversifyingNearestChildrenKnnCollectorManager.

However, we currently use TopKnnCollectorManager for nested indices.

final ReentrantKnnCollectorManager reentrantCollectorManager =
    new ReentrantKnnCollectorManager(
        new TopKnnCollectorManager(k, indexSearcher),
        segmentOrdToResults,
        knnQuery.getVectorDataType() == VectorDataType.FLOAT
            ? knnQuery.getQueryVector()
            : knnQuery.getByteQueryVector(),
        knnQuery.getField()
    );

Fix: Select Collector Manager Based on Index Type

We should create the collector manager depending on whether the index is nested.

final KnnCollectorManager knnCollectorManager =
    knnQuery.getParentsFilter() == null
        ? new TopKnnCollectorManager(k, indexSearcher)
        : new DiversifyingNearestChildrenKnnCollectorManager(
              k,
              knnQuery.getParentsFilter(),
              indexSearcher
          );

final ReentrantKnnCollectorManager reentrantCollectorManager =
    new ReentrantKnnCollectorManager(
        knnCollectorManager,
        segmentOrdToResults,
        knnQuery.getVectorDataType() == VectorDataType.FLOAT
            ? knnQuery.getQueryVector()
            : knnQuery.getByteQueryVector(),
        knnQuery.getField()
    );

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • [O] New functionality includes testing.
  • [O] New functionality has been documented.
  • [O] API changes companion pull request created.
  • [O] Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>
Signed-off-by: Dooyong Kim <kdooyong@amazon.com>
Signed-off-by: Dooyong Kim <kdooyong@amazon.com>
Signed-off-by: Dooyong Kim <kdooyong@amazon.com>
* iterations is less than 2, preserving O(1) amortized time.
* </p>
*/
public final class RobustUniqueRandomIterator {
Copy link
Copy Markdown
Collaborator Author

@0ctopus13prime 0ctopus13prime Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LCG based random number generation is almost four times faster than HashSet based deduplication.
Also it does not require additional memory.

HashSet based micro performance

        Min: 541 nano
        50%: 1959 nano
        90%: 3458 nano
        95%: 3750 nano
        99%: 5291 nano
        Max: 43125 nano

LCG based micro performance

        Min: 125 nano
        50%: 333 nano
        90%: 458 nano
        95%: 1417 nano
        99%: 1625 nano
        Max: 12458 nano

Signed-off-by: Navneet Verma <navneev@amazon.com>
Signed-off-by: Dooyong Kim <kdooyong@amazon.com>
Comment on lines +418 to +420
final KnnCollectorManager collectorManager = knnQuery.getParentsFilter() == null
? new TopKnnCollectorManager(k, indexSearcher)
: new DiversifyingNearestChildrenKnnCollectorManager(k, knnQuery.getParentsFilter(), indexSearcher);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Clarification needed]
I am little surprised that this is not just related to remote index build it can be for other search path too right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, yes. Not only for the Cagra case this was affecting all nested case.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Now it makes more sense to me. :)

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.48%. Comparing base (159e40d) to head (e1efe48).
⚠️ Report is 20 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3155      +/-   ##
============================================
+ Coverage     82.45%   82.48%   +0.02%     
- Complexity     3856     3867      +11     
============================================
  Files           417      418       +1     
  Lines         14423    14449      +26     
  Branches       1841     1847       +6     
============================================
+ Hits          11893    11918      +25     
- Misses         1771     1773       +2     
+ Partials        759      758       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Vikasht34 Vikasht34 merged commit 612b549 into opensearch-project:main Mar 12, 2026
47 of 48 checks passed
@0ctopus13prime 0ctopus13prime added the bug Something isn't working label Mar 23, 2026
@0ctopus13prime 0ctopus13prime changed the title MOS optimistic search bug fix on nested Cagra index. [Bug] MOS optimistic search bug fix on nested Cagra index. Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working v3.6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants