Fix bus error or segfault from roi_align with large batchsize by zy1git · Pull Request #9441 · pytorch/vision

zy1git · 2026-03-13T09:51:18Z

Summary
Bug: roi_align in torchvision crashes with a bus error/segfault on CPU or returns silently wrong (all-zero) results on CUDA when the total number of output elements exceeds INT_MAX (~2.1 billion). This is caused by 32-bit int overflow in index arithmetic within the C++ and CUDA kernels.

Root Cause: The kernels use int for composite index calculations like n × channels × pooled_width × pooled_height and pointer offsets like (roi_batch_ind × channels + c) × height × width. When these products exceed 2,147,483,647, the int wraps to a negative value, causing out-of-bounds memory access.

Example: FasterRCNN with batch_size=172 generates ~172,000 ROIs. The output index reaches 171,999 × 256 × 7 × 7 = 2,157,555,456 > INT_MAX, which matches the reporter's observed threshold exactly.

Fix: Promoted int to int64_t for all index, offset, and stride variables in the relevant files.

Test Plan
New overflow regression test
pytest test/test_ops.py::TestRoIAlign::test_roi_align_large_index -v

Existing tests — verify no regressions
pytest test/test_ops.py::TestRoIAlign -v

Fixes #8206

pytorch-bot · 2026-03-13T09:51:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9441

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d9ab5ce with merge base 6f131f1 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Zhitao Yu added 2 commits March 13, 2026 02:38

fix the issue 8206 and add the test

8c71ea8

fix the issue 8206 and add the test

40b2276

meta-cla bot added the cla signed label Mar 13, 2026

remove unnecessary comments

d9ab5ce

zy1git marked this pull request as draft March 13, 2026 10:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bus error or segfault from roi_align with large batchsize#9441

Fix bus error or segfault from roi_align with large batchsize#9441
zy1git wants to merge 3 commits intopytorch:mainfrom
zy1git:issue-8206

zy1git commented Mar 13, 2026

Uh oh!

pytorch-bot bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zy1git commented Mar 13, 2026

Uh oh!

pytorch-bot bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9441

✅ No Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot bot commented Mar 13, 2026 •

edited

Loading