Skip to content

Fix renderer stack overflow on Windows (#1139)#1139

Open
yutingye wants to merge 1 commit intomainfrom
export-D96515830
Open

Fix renderer stack overflow on Windows (#1139)#1139
yutingye wants to merge 1 commit intomainfrom
export-D96515830

Conversation

@yutingye
Copy link
Copy Markdown
Contributor

@yutingye yutingye commented Mar 13, 2026

Summary:

Fix the SIMD rasterizer stack overflow on Windows by marking rasterizeOneTriangle as __declspec(noinline) on MSVC.

The root cause: MSVC aggressively inlines rasterizeOneTriangle (and its callees shade, interpolateRGBTextureMap, etc.) into rasterizeMeshImp, creating a single merged stack frame. The combined SIMD locals (drjit packet types like Matrix3fP, Matrix3dP, Vector3fP) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.

The fix uses #ifdef _MSC_VER / __declspec(noinline) on rasterizeOneTriangle to prevent this frame merging on Windows. On other platforms, the inline keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.

Changes:

  • rasterizer.cpp: Added __declspec(noinline) for MSVC on rasterizeOneTriangle to prevent stack frame merging that causes the overflow.
  • test_renderer.py: Removed the if sys.platform == "win32": return workarounds from all three texture coordinate test methods, and removed the now-unused import sys.
  • pixi.toml: Removed the test_lighting skip for the Windows CI entry, since the root cause is now fixed.

Differential Revision: D96515830

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 13, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Mar 13, 2026

@yutingye has exported this pull request. If you are a Meta employee, you can view the originating Diff in D96515830.

@meta-codesync meta-codesync Bot changed the title Fix renderer stack overflow on Windows Fix renderer stack overflow on Windows (#1139) Mar 14, 2026
meta-codesync Bot pushed a commit that referenced this pull request Mar 14, 2026
Summary:

Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC.

The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.

The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.

Changes:
- `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow.
- `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`.
- `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed.

Differential Revision: D96515830
meta-codesync Bot pushed a commit that referenced this pull request Mar 14, 2026
Summary:

Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC.

The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.

The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.

Changes:
- `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow.
- `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`.
- `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed.

Differential Revision: D96515830
@meta-codesync meta-codesync Bot force-pushed the export-D96515830 branch 2 times, most recently from d6fe682 to ffd8e2d Compare March 14, 2026 08:40
meta-codesync Bot pushed a commit that referenced this pull request Mar 14, 2026
Summary:

Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC.

The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.

The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.

Changes:
- `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow.
- `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`.
- `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed.

Differential Revision: D96515830
meta-codesync Bot pushed a commit that referenced this pull request Mar 14, 2026
Summary:

Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC.

The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.

The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.

Changes:
- `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow.
- `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`.
- `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed.

Differential Revision: D96515830
meta-codesync Bot pushed a commit that referenced this pull request Mar 14, 2026
Summary:

Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC.

The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.

The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.

Changes:
- `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow.
- `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`.
- `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed.

Differential Revision: D96515830
yutingye added a commit that referenced this pull request Mar 14, 2026
Summary:
Pull Request resolved: #1139

Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC.

The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.

The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.

Changes:
- `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow.
- `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`.
- `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed.

Differential Revision: D96515830
meta-codesync Bot pushed a commit that referenced this pull request Mar 14, 2026
Summary:

Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC.

The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.

The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.

Changes:
- `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow.
- `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`.
- `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed.

Differential Revision: D96515830
meta-codesync Bot pushed a commit that referenced this pull request Mar 15, 2026
Summary:

Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC.

The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.

The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.

Changes:
- `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow.
- `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`.
- `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed.

Differential Revision: D96515830
meta-codesync Bot pushed a commit that referenced this pull request Mar 15, 2026
Summary:

Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC.

The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.

The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.

Changes:
- `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow.
- `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`.
- `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed.

Differential Revision: D96515830
meta-codesync Bot pushed a commit that referenced this pull request Mar 15, 2026
Summary:

Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC.

The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.

The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.

Changes:
- `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow.
- `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`.
- `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed.

Differential Revision: D96515830
meta-codesync Bot pushed a commit that referenced this pull request Mar 15, 2026
Summary:

Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC.

The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.

The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.

Changes:
- `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow.
- `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`.
- `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed.

Differential Revision: D96515830
Summary:
Pull Request resolved: #1139

Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC.

The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.

The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.

Changes:
- `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow.
- `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`.
- `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed.

Differential Revision: D96515830
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant