Implement CUDA multipass for knn > GPU_MAX_SELECTION_K by nicolaloi · Pull Request #7381 · isl-org/Open3D

nicolaloi · 2025-12-08T12:03:44Z

Type

Bug fix (non-breaking change which fixes an issue): Fixes knn_search abnormal behavior when knn > 2048 using GPU, return all 0 or very large random integer array #7301
New feature (non-breaking change which adds functionality). Resolves #
Breaking change (fix or feature that would cause existing functionality to not work as expected) Resolves #

Motivation and Context

The KNN search on GPU breaks silently when the k value is larger than the macro GPU_MAX_SELECTION_K, resulting in a trash output (all 0s, large indices > number of total points, or even negative indices). The macro GPU_MAX_SELECTION_K is 2048 if CUDA_VERSION > 9000, otherwise it is 1024. On CPU, the KNN search obviously has no such limits. To improve the GPU KNN search without altering the macro GPU_MAX_SELECTION_K, a multipass algorithm should be implemented, splitting the KNN search into batches where each batch size is < GPU_MAX_SELECTION_K.

Checklist:

I have run python util/check_style.py --apply to apply Open3D code style
to my code.
This PR changes Open3D behavior or adds new functionality.
- Both C++ (Doxygen) and Python (Sphinx / Google style) documentation is
  updated accordingly.
- I have added or updated C++ and / or Python unit tests OR included test
  results (e.g. screenshots or numbers) here.
I will follow up and update the code if CI fails.
For fork PRs, I have selected Allow edits from maintainers.

Description

I have implemented a multipass algorithm to find large KNN on CUDA, splitting the search into multiple batches not larger than GPU_MAX_SELECTION_K. The main challenge is to mask indices that have already been found in earlier passes/iterations, taking care of tiling and contiguousness.

To improve readability, I have separated the function into two distinct functions, depending on whether or not the multipass algorithm should be used:

Open3D/cpp/open3d/core/nns/KnnSearchOps.cu

Lines 535 to 543 in c0a4fcb

    
           if (knn <= GPU_MAX_SELECTION_K) { 
        
               KnnSearchCUDASinglePass<T, TIndex>(points, queries, knn, tile_rows, 
        
                                                  tile_cols, output_allocator, 
        
                                                  point_norms, query_norms); 
        
           } else { 
        
               KnnSearchCUDAMultiPass<T, TIndex>(points, queries, knn, tile_rows, 
        
                                                 tile_cols, output_allocator, 
        
                                                 point_norms, query_norms); 
        
           }

I have created a script with 120 test cases to test the change with different cases (small/large clouds up to 2 million points, multiple queries, small/very large knn up to 50000). This PR passes all the tests, while the original master branch code does not: cuda_knn_test.py

update-docs · 2025-12-08T12:03:48Z

Thanks for submitting this pull request! The maintainers of this repository would appreciate if you could update the CHANGELOG.md based on your changes.

Copilot

Pull request overview

This PR implements a multi-pass algorithm for CUDA KNN search to handle k values larger than GPU_MAX_SELECTION_K (1024 or 2048 depending on CUDA version). Previously, KNN searches with k > GPU_MAX_SELECTION_K would silently fail and produce incorrect results. The implementation splits large KNN searches into batches, using a bitmask to track already-selected neighbors across passes.

Key changes:

Added multi-pass algorithm with masking to handle k > GPU_MAX_SELECTION_K
Split the optimized KNN search into separate single-pass and multi-pass functions
Fixed memory stride handling for non-contiguous tensor views in L2Select kernel

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
cpp/open3d/core/nns/KnnSearchOps.cu	Implements multi-pass KNN algorithm with masking kernels, separates single-pass and multi-pass logic, and fixes early return initialization
cpp/open3d/core/nns/kernel/L2Select.cuh	Adds stride parameters to handle non-contiguous tensor views correctly in distance calculations
CHANGELOG.md	Documents the new multi-pass KNN feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cpp/open3d/core/nns/KnnSearchOps.cu

nicolaloi · 2026-01-17T17:01:59Z

@ssheorey It seems that the test failures are unrelated to this PR.

ssheorey · 2026-02-23T22:01:17Z

@nicolaloi thanks for finding and fixing this issue. The PR looks good to me. Can you add one representative test case (that exercises the multipass code) from cuda_knn_test.py to the nn test suite here:

python/test/core/test_nn.py

nicolaloi · 2026-02-24T22:25:59Z

Ok, I'll add it in the next few days.

nicolaloi · 2026-03-05T23:11:18Z

@ssheorey the test I have added fails with the main branch but passes with this PR branch.

cuda multipass for knn >= GPU_MAX_SELECTION_K

c0a4fcb

nicolaloi changed the title ~~CUDA multipass for knn >= GPU_MAX_SELECTION_K~~ CUDA multipass for knn > GPU_MAX_SELECTION_K Dec 8, 2025

nicolaloi changed the title ~~CUDA multipass for knn > GPU_MAX_SELECTION_K~~ Implement CUDA multipass for knn > GPU_MAX_SELECTION_K Dec 8, 2025

update CHANGELOG.md

0836ead

nicolaloi mentioned this pull request Dec 8, 2025

knn_search abnormal behavior when knn > 2048 using GPU, return all 0 or very large random integer array #7301

Closed

3 tasks

OuYaozhong mentioned this pull request Dec 24, 2025

Non-deterministic for-loop during building from source code #7390

Open

3 tasks

ssheorey requested review from Copilot and ssheorey January 7, 2026 19:23

Copilot AI reviewed Jan 7, 2026

View reviewed changes

nicolaloi added 2 commits January 10, 2026 17:13

Merge branch 'main' into nicolaloi/cuda-multipass-knn

1079b1f

review + fix for window test

c309bb8

ssheorey added the status / to merge Looks good, merge after minor updates. label Feb 24, 2026

ssheorey added the status / needs info Waiting for information from reporter / author label Feb 25, 2026

ssheorey added this to the v0.20 milestone Mar 4, 2026

nicolaloi force-pushed the nicolaloi/cuda-multipass-knn branch from c91dd75 to 87afdf8 Compare March 5, 2026 23:12

update pytest

3ac92b1

nicolaloi force-pushed the nicolaloi/cuda-multipass-knn branch from 87afdf8 to 3ac92b1 Compare March 5, 2026 23:15

ssheorey approved these changes Mar 6, 2026

View reviewed changes

ssheorey removed the status / needs info Waiting for information from reporter / author label Mar 6, 2026

ssheorey merged commit 7dd7bc1 into isl-org:main Mar 10, 2026
27 of 28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement CUDA multipass for knn > GPU_MAX_SELECTION_K#7381

Implement CUDA multipass for knn > GPU_MAX_SELECTION_K#7381
ssheorey merged 5 commits intoisl-org:mainfrom
nicolaloi:nicolaloi/cuda-multipass-knn

nicolaloi commented Dec 8, 2025 •

edited

Loading

Uh oh!

update-docs bot commented Dec 8, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nicolaloi commented Jan 17, 2026 •

edited

Loading

Uh oh!

ssheorey commented Feb 23, 2026

Uh oh!

nicolaloi commented Feb 24, 2026

Uh oh!

nicolaloi commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if (knn <= GPU_MAX_SELECTION_K) {
	KnnSearchCUDASinglePass<T, TIndex>(points, queries, knn, tile_rows,
	tile_cols, output_allocator,
	point_norms, query_norms);
	} else {
	KnnSearchCUDAMultiPass<T, TIndex>(points, queries, knn, tile_rows,
	tile_cols, output_allocator,
	point_norms, query_norms);
	}

Conversation

nicolaloi commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type

Motivation and Context

Checklist:

Description

Uh oh!

update-docs bot commented Dec 8, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nicolaloi commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssheorey commented Feb 23, 2026

Uh oh!

nicolaloi commented Feb 24, 2026

Uh oh!

nicolaloi commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nicolaloi commented Dec 8, 2025 •

edited

Loading

nicolaloi commented Jan 17, 2026 •

edited

Loading