Implement CUDA multipass for knn > GPU_MAX_SELECTION_K#7381
Implement CUDA multipass for knn > GPU_MAX_SELECTION_K#7381ssheorey merged 5 commits intoisl-org:mainfrom
Conversation
|
Thanks for submitting this pull request! The maintainers of this repository would appreciate if you could update the CHANGELOG.md based on your changes. |
There was a problem hiding this comment.
Pull request overview
This PR implements a multi-pass algorithm for CUDA KNN search to handle k values larger than GPU_MAX_SELECTION_K (1024 or 2048 depending on CUDA version). Previously, KNN searches with k > GPU_MAX_SELECTION_K would silently fail and produce incorrect results. The implementation splits large KNN searches into batches, using a bitmask to track already-selected neighbors across passes.
Key changes:
- Added multi-pass algorithm with masking to handle k >
GPU_MAX_SELECTION_K - Split the optimized KNN search into separate single-pass and multi-pass functions
- Fixed memory stride handling for non-contiguous tensor views in L2Select kernel
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| cpp/open3d/core/nns/KnnSearchOps.cu | Implements multi-pass KNN algorithm with masking kernels, separates single-pass and multi-pass logic, and fixes early return initialization |
| cpp/open3d/core/nns/kernel/L2Select.cuh | Adds stride parameters to handle non-contiguous tensor views correctly in distance calculations |
| CHANGELOG.md | Documents the new multi-pass KNN feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@ssheorey It seems that the test failures are unrelated to this PR. |
|
@nicolaloi thanks for finding and fixing this issue. The PR looks good to me. Can you add one representative test case (that exercises the multipass code) from cuda_knn_test.py to the nn test suite here:
|
|
Ok, I'll add it in the next few days. |
|
@ssheorey the test I have added fails with the main branch but passes with this PR branch. |
c91dd75 to
87afdf8
Compare
87afdf8 to
3ac92b1
Compare
Type
knn_searchabnormal behavior whenknn > 2048using GPU, return all 0 or very large random integer array #7301Motivation and Context
The KNN search on GPU breaks silently when the k value is larger than the macro
GPU_MAX_SELECTION_K, resulting in a trash output (all 0s, large indices > number of total points, or even negative indices). The macroGPU_MAX_SELECTION_Kis 2048 ifCUDA_VERSION > 9000, otherwise it is 1024. On CPU, the KNN search obviously has no such limits. To improve the GPU KNN search without altering the macroGPU_MAX_SELECTION_K, a multipass algorithm should be implemented, splitting the KNN search into batches where each batch size is <GPU_MAX_SELECTION_K.Checklist:
python util/check_style.py --applyto apply Open3D code styleto my code.
updated accordingly.
results (e.g. screenshots or numbers) here.
Description
I have implemented a multipass algorithm to find large KNN on CUDA, splitting the search into multiple batches not larger than
GPU_MAX_SELECTION_K. The main challenge is to mask indices that have already been found in earlier passes/iterations, taking care of tiling and contiguousness.To improve readability, I have separated the function into two distinct functions, depending on whether or not the multipass algorithm should be used:
Open3D/cpp/open3d/core/nns/KnnSearchOps.cu
Lines 535 to 543 in c0a4fcb
I have created a script with 120 test cases to test the change with different cases (small/large clouds up to 2 million points, multiple queries, small/very large knn up to 50000). This PR passes all the tests, while the original master branch code does not: cuda_knn_test.py