Skip to content

Rotated bounding box NMS implementation for CPU#9450

Open
zy1git wants to merge 12 commits intopytorch:mainfrom
zy1git:rotated-NMS
Open

Rotated bounding box NMS implementation for CPU#9450
zy1git wants to merge 12 commits intopytorch:mainfrom
zy1git:rotated-NMS

Conversation

@zy1git
Copy link
Contributor

@zy1git zy1git commented Mar 23, 2026

Summary:
Implemented rotated box NMS (Non-Maximum Suppression) for CPU, adapted from Detectron2's nms_rotated implementation. The NMS algorithm is identical to standard NMS — sort by scores, suppress overlapping boxes — but uses single_box_iou_rotated for IoU computation instead of axis-aligned intersection. The public API follows the existing nms op pattern in TorchVision.

Test Plan:

Added TestNMSRotated test class adapted from Detectron2's test suite:

  • 0° rotation test: rotated NMS with angle=0 should match reference horizontal NMS (IoU thresholds 0.2, 0.5, 0.8)

  • 90° rotation test: rotated NMS with angle=90 and swapped width/height should match reference horizontal NMS

  • 180° rotation test: rotated NMS with angle=180 should match reference horizontal NMS

  • TorchScript compatibility test

Results are compared using edit distance (≤ 1 allowed) to account for floating-point precision differences at IoU threshold boundaries.

Run pytest test/test_ops.py::TestNMSRotated -v
All tests pass locally.

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9450

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 54aee4c with merge base d7400a3 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@zy1git zy1git marked this pull request as draft March 23, 2026 04:28
@meta-cla meta-cla bot added the cla signed label Mar 23, 2026

auto ovr = single_box_iou_rotated<scalar_t>(
dets[i].data_ptr<scalar_t>(), dets[j].data_ptr<scalar_t>());
if (ovr >= iou_threshold) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging that this is different from the iou threshold comparison we have in the non-rotated case:

if (ovr > iou_threshold) {
.

See my other comment about unifying the implementation, which should resolve this as a consequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is resolved in the new commit after unifying the implementation.

namespace {

template <typename scalar_t>
at::Tensor nms_rotated_cpu_kernel(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the same implementation we already have for the non-rotated case, the only difference being the iou computation:

at::Tensor nms_kernel_impl(

Could we consider fusing the two implementations, perhaps templating over the iou computation function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had the same thought when I did the implementation. I wanted to stick to Detectron2's implementation in the first version to make sure it works correctly, then refactor. I have fused the two implementations in the new commit.

return torch.ops.torchvision.nms(boxes, scores, iou_threshold)


def nms_rotated(boxes: Tensor, scores: Tensor, iou_threshold: float) -> Tensor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to expose nms_rotated instead of just handling all this within a single nms function?

For iou, we chose not to expose iou_rotated at the Python layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I fixed this in the new commit.



def nms(boxes: Tensor, scores: Tensor, iou_threshold: float) -> Tensor:
def nms(boxes: Tensor, scores: Tensor, iou_threshold: float, fmt: str = "xyxy") -> Tensor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just infer the format base on the .shape attribute? If shape [-2] is 4 then it should be aligned, if 5 then it should be non-aligned?
I'm not sure, we'd have to verify that this actually doesn't break any other convension. Like, was shape[-2] == 5 even allowed before? If not, we're clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. I think it should be shape[-1] rather than shape[-2]. I checked shape[-1] == 5 was not allowed before (the kernel enforces size(1) == 4). And the format is unique for dimension 4 and 5, so this works. I have implemented it in the new commit.

@zy1git zy1git marked this pull request as ready for review March 26, 2026 04:15


class TestNMSRotated:
def _reference_horizontal_nms(self, boxes, scores, iou_threshold):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use TestNMS._reference_nms instead, you might have to make it a class method instead of an instance method

Comment on lines +2014 to +2015
box_scores (N, 5): boxes in corner-form and probabilities.
(Note here 5 == 4 + 1, i.e., 4-dim horizontal box + 1-dim prob)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably be resolved automatically when we use TestNMS._reference_nms, but flagging that this box_scores parameter doesn't exist.

else:
raise ValueError(
f"boxes should have 4 (axis-aligned) or 5 (rotated) elements in the last dimension, got {boxes.size(-1)}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also make sure to address and test batched_nms just below

m, n = len(keep1), len(keep2)

# edit distance with DP
f = [np.arange(n + 1), np.arange(n + 1)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use numpy

@pytest.mark.parametrize("iou", (0.2, 0.5, 0.8))
def test_nms_rotated_0_degree(self, iou):
N = 1000
boxes, scores = self._create_tensors(N)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in other tests, call torch.manual_seed(0) so that the test is not dependent on RNG.


keep_ref = self._reference_horizontal_nms(boxes, scores, iou)
keep = ops.nms(rotated_boxes, scores, iou)
assert self._nms_edit_distance(keep, keep_ref) <= 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally, I am able to use torch.assert_close(..., atol=0) on many random seeds, we should consider using this instead of edit distance.

return boxes, scores

@pytest.mark.parametrize("iou", (0.2, 0.5, 0.8))
def test_nms_rotated_0_degree(self, iou):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is comparing our rotated implementation against _reference_horizontal_nms. We should also have a test that uses our non-rotated nms implementation as the reference.

return torch.as_tensor(picked)

@staticmethod
def _nms_edit_distance(keep1, keep2):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto inter = w * h;
auto ovr = inter / (iarea + areas[j] - inter);

auto ovr = iou_func.compare(j);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: call this compute rather than compare


scalar_t ix1, iy1, ix2, iy2, iarea;

AABBIoU(const at::Tensor& dets) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this NonRotatedIoU or AlignedIoU


RotatedIoU(const at::Tensor& dets) : dets_ptr(&dets) {}

int64_t cached_i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call this i

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants