Fix renderer stack overflow on Windows (#1139)#1139
Open
Conversation
Contributor
meta-codesync Bot
pushed a commit
that referenced
this pull request
Mar 14, 2026
Summary: Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC. The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer. The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression. Changes: - `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow. - `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`. - `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed. Differential Revision: D96515830
b4362a1 to
2173986
Compare
meta-codesync Bot
pushed a commit
that referenced
this pull request
Mar 14, 2026
Summary: Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC. The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer. The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression. Changes: - `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow. - `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`. - `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed. Differential Revision: D96515830
d6fe682 to
ffd8e2d
Compare
meta-codesync Bot
pushed a commit
that referenced
this pull request
Mar 14, 2026
Summary: Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC. The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer. The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression. Changes: - `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow. - `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`. - `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed. Differential Revision: D96515830
meta-codesync Bot
pushed a commit
that referenced
this pull request
Mar 14, 2026
Summary: Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC. The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer. The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression. Changes: - `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow. - `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`. - `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed. Differential Revision: D96515830
ffd8e2d to
55dc286
Compare
meta-codesync Bot
pushed a commit
that referenced
this pull request
Mar 14, 2026
Summary: Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC. The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer. The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression. Changes: - `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow. - `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`. - `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed. Differential Revision: D96515830
55dc286 to
20e0a89
Compare
yutingye
added a commit
that referenced
this pull request
Mar 14, 2026
Summary: Pull Request resolved: #1139 Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC. The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer. The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression. Changes: - `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow. - `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`. - `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed. Differential Revision: D96515830
20e0a89 to
ff6be05
Compare
meta-codesync Bot
pushed a commit
that referenced
this pull request
Mar 14, 2026
Summary: Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC. The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer. The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression. Changes: - `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow. - `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`. - `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed. Differential Revision: D96515830
ff6be05 to
8eccbf6
Compare
meta-codesync Bot
pushed a commit
that referenced
this pull request
Mar 15, 2026
Summary: Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC. The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer. The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression. Changes: - `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow. - `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`. - `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed. Differential Revision: D96515830
8eccbf6 to
330a426
Compare
meta-codesync Bot
pushed a commit
that referenced
this pull request
Mar 15, 2026
Summary: Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC. The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer. The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression. Changes: - `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow. - `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`. - `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed. Differential Revision: D96515830
330a426 to
5b1e92c
Compare
meta-codesync Bot
pushed a commit
that referenced
this pull request
Mar 15, 2026
Summary: Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC. The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer. The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression. Changes: - `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow. - `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`. - `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed. Differential Revision: D96515830
5b1e92c to
c111931
Compare
meta-codesync Bot
pushed a commit
that referenced
this pull request
Mar 15, 2026
Summary: Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC. The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer. The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression. Changes: - `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow. - `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`. - `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed. Differential Revision: D96515830
c111931 to
dfe0ec3
Compare
Summary: Pull Request resolved: #1139 Fix the SIMD rasterizer stack overflow on Windows by marking `rasterizeOneTriangle` as `__declspec(noinline)` on MSVC. The root cause: MSVC aggressively inlines `rasterizeOneTriangle` (and its callees `shade`, `interpolateRGBTextureMap`, etc.) into `rasterizeMeshImp`, creating a single merged stack frame. The combined SIMD locals (drjit packet types like `Matrix3fP`, `Matrix3dP`, `Vector3fP`) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer. The fix uses `#ifdef _MSC_VER` / `__declspec(noinline)` on `rasterizeOneTriangle` to prevent this frame merging on Windows. On other platforms, the `inline` keyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression. Changes: - `rasterizer.cpp`: Added `__declspec(noinline)` for MSVC on `rasterizeOneTriangle` to prevent stack frame merging that causes the overflow. - `test_renderer.py`: Removed the `if sys.platform == "win32": return` workarounds from all three texture coordinate test methods, and removed the now-unused `import sys`. - `pixi.toml`: Removed the `test_lighting` skip for the Windows CI entry, since the root cause is now fixed. Differential Revision: D96515830
dfe0ec3 to
3601f76
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Fix the SIMD rasterizer stack overflow on Windows by marking
rasterizeOneTriangleas__declspec(noinline)on MSVC.The root cause: MSVC aggressively inlines
rasterizeOneTriangle(and its calleesshade,interpolateRGBTextureMap, etc.) intorasterizeMeshImp, creating a single merged stack frame. The combined SIMD locals (drjit packet types likeMatrix3fP,Matrix3dP,Vector3fP) occupy ~4+ KB, which overflows the default 1 MB Windows thread stack when combined with the deep call chain from Python through pybind11 into the rasterizer.The fix uses
#ifdef _MSC_VER/__declspec(noinline)onrasterizeOneTriangleto prevent this frame merging on Windows. On other platforms, theinlinekeyword is removed (was only a hint; the compiler's cost model already decides whether to inline this large function), so there is no performance regression.Changes:
rasterizer.cpp: Added__declspec(noinline)for MSVC onrasterizeOneTriangleto prevent stack frame merging that causes the overflow.test_renderer.py: Removed theif sys.platform == "win32": returnworkarounds from all three texture coordinate test methods, and removed the now-unusedimport sys.pixi.toml: Removed thetest_lightingskip for the Windows CI entry, since the root cause is now fixed.Differential Revision: D96515830