Validate default_ptx_arch against NVRTC supported architectures#1276
Validate default_ptx_arch against NVRTC supported architectures#1276cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a module-level helper to select an NVRTC-supported PTX architecture and applies it to validate/compute PTX arch in both CUDA-present and no-device NVRTC code paths; adds unit tests covering match, clamp-up, fallback, single-element, and empty-set error cases. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@warp/_src/context.py`:
- Around line 5333-5341: The current clamp only updates runtime.default_ptx_arch
and can let an unsupported per-device PTX target slip through; update the final
PTX-target selection in Device.get_cuda_compile_arch() (which computes
min(self.arch, warp.config.ptx_target_arch or runtime.default_ptx_arch)) to
revalidate the chosen target against self.nvrtc_supported_archs using
_resolve_supported_ptx_arch (or equivalent) and ensure the returned target is
NVRTC-supported for that device; if no supported PTX exists for the device,
raise a clear, descriptive error indicating the device arch and the available
nvrtc_supported_archs so compilation will fail fast.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 09d275e4-b3d2-4717-a9d8-03d4da24e2db
📒 Files selected for processing (2)
warp/_src/context.pywarp/tests/test_context.py
Greptile SummaryThis PR introduces
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Runtime.__init__] --> B{CUDA devices present?}
B -- Yes --> C[default_ptx_arch = 75]
C --> D[min arch of eligible devices\nin nvrtc_supported_archs]
D -- success --> E[default_ptx_arch = min result]
D -- ValueError --> F[retain default_ptx_arch = 75]
E --> G{default_ptx_arch\nin nvrtc_supported_archs?}
F --> G
G -- No --> H[_resolve_supported_ptx_arch\ndefault_ptx_arch, nvrtc_supported_archs]
G -- Yes --> I[default_ptx_arch finalised]
H --> I
B -- No --> J{nvrtc_supported_archs\nnon-empty?}
J -- Yes --> K[target = ptx_target_arch or 75]
K --> L[_resolve_supported_ptx_arch\ntarget, nvrtc_supported_archs]
L --> I
J -- No --> M[default_ptx_arch = None]
I --> N[Device.get_cuda_output_arch]
N --> O[output_arch = min device.arch, default_ptx_arch]
O --> P{output_arch\nin nvrtc_supported_archs?}
P -- No --> Q[_resolve_supported_ptx_arch\noutput_arch, nvrtc_supported_archs]
P -- Yes --> R[use output_arch for compilation]
Q --> R
subgraph _resolve_supported_ptx_arch
S{supported_archs empty?} -- Yes --> T[raise ValueError]
S -- No --> U{target_arch\nin supported_archs?}
U -- Yes --> V[return target_arch]
U -- No --> W[above = sorted archs >= target]
W --> X{above non-empty?}
X -- Yes --> Y[return above_0 lowest >= target]
X -- No --> Z[return max supported_archs]
end
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
warp/_src/context.py (1)
3865-3867:⚠️ Potential issue | 🟠 MajorReject PTX targets that exceed the device architecture.
This still allows
output_archto clamp upward pastself.arch. For example, ansm_75device withnvrtc_supported_archs={80, 86}now resolves to80, which is not runnable on that device. This should fail fast instead of returning an invalid PTX target.Suggested fix
- if runtime.nvrtc_supported_archs and output_arch not in runtime.nvrtc_supported_archs: - output_arch = _resolve_supported_ptx_arch(output_arch, runtime.nvrtc_supported_archs) + if runtime.nvrtc_supported_archs: + runnable_archs = {arch for arch in runtime.nvrtc_supported_archs if arch <= self.arch} + if not runnable_archs: + raise RuntimeError( + f"No NVRTC-supported PTX target can run on device {self.alias} " + f"(sm_{self.arch}); supported PTX targets are {sorted(runtime.nvrtc_supported_archs)}." + ) + if output_arch not in runnable_archs: + output_arch = _resolve_supported_ptx_arch(output_arch, runnable_archs)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@warp/_src/context.py` around lines 3865 - 3867, The current logic may clamp output_arch upward to a PTX target not supported by the physical device (e.g., device with self.arch == 75 gets resolved to 80 via _resolve_supported_ptx_arch); instead, after resolving a supported NVRTC arch via _resolve_supported_ptx_arch (when runtime.nvrtc_supported_archs is set), verify that the resolved arch does not exceed the device architecture (compare numeric SM values from output_arch/self.arch) and if it does, raise an explicit error (e.g., ValueError or RuntimeError) rather than returning an invalid PTX target; update the branch that references runtime.nvrtc_supported_archs, output_arch, and _resolve_supported_ptx_arch to perform this check and raise on mismatch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@warp/_src/context.py`:
- Around line 3946-3953: Update the docstring to use the repo's RST-style
double-backtick formatting for code and parameter references: replace italicized
occurrences of target_arch and supported_archs with ``target_arch`` and
``supported_archs`` (also change any other italicized terms like *NVRTC* or
inline code to ``NVRTC`` or appropriate ``...``), keeping the existing text and
semantics (the description about returning the closest supported arch and
raising ValueError remains unchanged).
---
Duplicate comments:
In `@warp/_src/context.py`:
- Around line 3865-3867: The current logic may clamp output_arch upward to a PTX
target not supported by the physical device (e.g., device with self.arch == 75
gets resolved to 80 via _resolve_supported_ptx_arch); instead, after resolving a
supported NVRTC arch via _resolve_supported_ptx_arch (when
runtime.nvrtc_supported_archs is set), verify that the resolved arch does not
exceed the device architecture (compare numeric SM values from
output_arch/self.arch) and if it does, raise an explicit error (e.g., ValueError
or RuntimeError) rather than returning an invalid PTX target; update the branch
that references runtime.nvrtc_supported_archs, output_arch, and
_resolve_supported_ptx_arch to perform this check and raise on mismatch.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: ac0ce7b2-a16c-41bd-9052-5ef39e32319c
📒 Files selected for processing (2)
warp/_src/context.pywarp/tests/test_context.py
| """Return *target_arch* if NVRTC supports it, otherwise the closest supported arch. | ||
|
|
||
| Preference is given to the lowest supported architecture that is >= | ||
| *target_arch* so that the resulting PTX is as broadly forward-compatible | ||
| as possible. If no such architecture exists the highest supported value | ||
| is returned instead. | ||
|
|
||
| *supported_archs* must be non-empty; a ``ValueError`` is raised otherwise. |
There was a problem hiding this comment.
Use repo docstring formatting for parameter references.
The new helper docstring uses italics for target_arch and supported_archs; this repo’s docstring convention requires double backticks for code elements and parameter cross-references.
Suggested fix
- """Return *target_arch* if NVRTC supports it, otherwise the closest supported arch.
+ """Return ``target_arch`` if NVRTC supports it, otherwise the closest supported arch.
Preference is given to the lowest supported architecture that is >=
- *target_arch* so that the resulting PTX is as broadly forward-compatible
+ ``target_arch`` so that the resulting PTX is as broadly forward-compatible
as possible. If no such architecture exists the highest supported value
is returned instead.
- *supported_archs* must be non-empty; a ``ValueError`` is raised otherwise.
+ ``supported_archs`` must be non-empty; a ``ValueError`` is raised otherwise.
"""As per coding guidelines, "Use double backticks for code elements in docstrings (RST syntax)" and "Use double backticks for parameter cross-references in docstrings."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@warp/_src/context.py` around lines 3946 - 3953, Update the docstring to use
the repo's RST-style double-backtick formatting for code and parameter
references: replace italicized occurrences of target_arch and supported_archs
with ``target_arch`` and ``supported_archs`` (also change any other italicized
terms like *NVRTC* or inline code to ``NVRTC`` or appropriate ``...``), keeping
the existing text and semantics (the description about returning the closest
supported arch and raising ValueError remains unchanged).
| if runtime.nvrtc_supported_archs and output_arch not in runtime.nvrtc_supported_archs: | ||
| output_arch = _resolve_supported_ptx_arch(output_arch, runtime.nvrtc_supported_archs) |
There was a problem hiding this comment.
In the device-specific PTX output path, the resolved architecture can exceed the device's capability, producing incompatible PTX.
At lines 3861/3863, output_arch is already capped at self.arch via min(). However, _resolve_supported_ptx_arch uses a "clamp up" strategy — it returns the lowest supported arch >= target. In scenarios where NVRTC doesn't support self.arch but does support higher architectures (e.g., device is sm_75, NVRTC supports {80, 86}), the function returns 80, which exceeds the device's capability. PTX compiled for sm_80 cannot run on an sm_75 device, causing a runtime failure when the kernel is loaded.
For the device-specific path, prefer the highest supported arch ≤ self.arch as the first choice:
if runtime.nvrtc_supported_archs and output_arch not in runtime.nvrtc_supported_archs:
# Prefer highest supported arch <= self.arch (device-compatible fallback)
below = [a for a in runtime.nvrtc_supported_archs if a <= self.arch]
if below:
output_arch = max(below)
else:
# No supported arch <= device arch; fall back to clamp-up for best-effort
output_arch = _resolve_supported_ptx_arch(output_arch, runtime.nvrtc_supported_archs)This ensures the resolved arch never exceeds the device's supported architecture.
3a19703 to
fafa510
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
warp/_src/context.py (2)
3865-3867:⚠️ Potential issue | 🟠 MajorDon't clamp PTX targets past the device architecture.
With an
sm_75device andruntime.nvrtc_supported_archs={80, 86}, Line 3867 resolvesoutput_archto80. That PTX target is newer than the GPU, so the later JIT/load can still fail. Revalidate against the subset of NVRTC-supported arches that are<= self.arch, and raise a clear error if that subset is empty instead of clamping upward.Proposed fix
- # Ensure the chosen PTX arch is actually supported by NVRTC - if runtime.nvrtc_supported_archs and output_arch not in runtime.nvrtc_supported_archs: - output_arch = _resolve_supported_ptx_arch(output_arch, runtime.nvrtc_supported_archs) + # Ensure the chosen PTX arch is actually supported by NVRTC and runnable on this device. + if runtime.nvrtc_supported_archs: + device_supported_archs = {arch for arch in runtime.nvrtc_supported_archs if arch <= self.arch} + if not device_supported_archs: + raise RuntimeError( + f"No NVRTC-supported PTX target can run on device {self.alias} (sm_{self.arch}); " + f"available NVRTC targets: {sorted(runtime.nvrtc_supported_archs)}" + ) + if output_arch not in device_supported_archs: + output_arch = _resolve_supported_ptx_arch(output_arch, device_supported_archs)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@warp/_src/context.py` around lines 3865 - 3867, The current clamping logic in the block referencing runtime.nvrtc_supported_archs and output_arch calls _resolve_supported_ptx_arch which can select a newer PTX (e.g., choosing 80 for an sm_75 device); instead restrict the candidate NVRTC arches to those <= self.arch (i.e., filter runtime.nvrtc_supported_archs by numeric arch <= self.arch) and if that filtered set is empty raise a clear ValueError (or RuntimeError) describing no compatible NVRTC-supported PTX for this device, otherwise call _resolve_supported_ptx_arch with the filtered set to pick a supported target instead of clamping upward.
3946-3953:⚠️ Potential issue | 🟡 MinorUse repo docstring formatting for parameter references.
This helper docstring still uses italics for
target_archandsupported_archs; the repo convention here is double-backtick literals.Suggested fix
- """Return *target_arch* if NVRTC supports it, otherwise the closest supported arch. + """Return ``target_arch`` if NVRTC supports it, otherwise the closest supported arch. Preference is given to the lowest supported architecture that is >= - *target_arch* so that the resulting PTX is as broadly forward-compatible + ``target_arch`` so that the resulting PTX is as broadly forward-compatible as possible. If no such architecture exists the highest supported value is returned instead. - *supported_archs* must be non-empty; a ``ValueError`` is raised otherwise. + ``supported_archs`` must be non-empty; a ``ValueError`` is raised otherwise. """As per coding guidelines, "Use double backticks for code elements in docstrings (RST syntax)" and "Use double backticks for parameter cross-references in docstrings."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@warp/_src/context.py` around lines 3946 - 3953, Update the helper function docstring that describes returning a supported NVRTC architecture so it uses RST double-backtick literals instead of italics: replace instances of target_arch and supported_archs formatted with single asterisks/italics with ``target_arch`` and ``supported_archs`` (and any other code-like names such as NVRTC) to follow the repo docstring convention and parameter cross-reference style.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@warp/_src/context.py`:
- Around line 3865-3867: The current clamping logic in the block referencing
runtime.nvrtc_supported_archs and output_arch calls _resolve_supported_ptx_arch
which can select a newer PTX (e.g., choosing 80 for an sm_75 device); instead
restrict the candidate NVRTC arches to those <= self.arch (i.e., filter
runtime.nvrtc_supported_archs by numeric arch <= self.arch) and if that filtered
set is empty raise a clear ValueError (or RuntimeError) describing no compatible
NVRTC-supported PTX for this device, otherwise call _resolve_supported_ptx_arch
with the filtered set to pick a supported target instead of clamping upward.
- Around line 3946-3953: Update the helper function docstring that describes
returning a supported NVRTC architecture so it uses RST double-backtick literals
instead of italics: replace instances of target_arch and supported_archs
formatted with single asterisks/italics with ``target_arch`` and
``supported_archs`` (and any other code-like names such as NVRTC) to follow the
repo docstring convention and parameter cross-reference style.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 0f5abec7-7765-44c8-9e5a-59af9233085a
📒 Files selected for processing (2)
warp/_src/context.pywarp/tests/test_context.py
🚧 Files skipped from review as they are similar to previous changes (1)
- warp/tests/test_context.py
There was a problem hiding this comment.
Thanks for the contribution, @cluster2600!
Suggestion: Consider adding a warning inside _resolve_supported_ptx_arch when the arch gets adjusted, since the alternative (NVRTC compilation failure) is always worse:
resolved = above[0] if above else max(supported_archs)
print(
f"Warning: PTX target arch sm_{target_arch} is not supported by NVRTC; using sm_{resolved} instead"
)
return resolvedWithout this, users who hit the fallback path have no indication that their target arch was silently changed, which could be confusing if they're debugging arch-specific behavior.
Edit: Maybe the Warp warn() function is not appropriate to call here, edited with print
There was a problem hiding this comment.
♻️ Duplicate comments (1)
warp/_src/context.py (1)
3858-3867:⚠️ Potential issue | 🟠 MajorKeep the resolved PTX target at or below the device SM.
This still has the same remaining edge case as the earlier review thread:
_resolve_supported_ptx_arch()can raiseoutput_archaboveself.arch. For ansm_75device withruntime.nvrtc_supported_archs == {80, 86}, this resolves tosm_80, which is not a valid PTX target for that device. Filter the candidate set to architectures<= self.archhere, and fail fast if that filtered set is empty.🔧 Proposed fix
if self.get_cuda_output_format() == "ptx": # use the default PTX arch if the device supports it if warp.config.ptx_target_arch is not None: output_arch = min(self.arch, warp.config.ptx_target_arch) else: output_arch = min(self.arch, runtime.default_ptx_arch) - # Ensure the chosen PTX arch is actually supported by NVRTC - if runtime.nvrtc_supported_archs and output_arch not in runtime.nvrtc_supported_archs: - output_arch = _resolve_supported_ptx_arch(output_arch, runtime.nvrtc_supported_archs) + # Ensure the chosen PTX arch is supported by NVRTC and can run on this device. + if runtime.nvrtc_supported_archs: + compatible_archs = {arch for arch in runtime.nvrtc_supported_archs if arch <= self.arch} + if not compatible_archs: + raise RuntimeError( + f"Device {self.alias} (sm_{self.arch}) has no compatible PTX target in " + f"{sorted(runtime.nvrtc_supported_archs)}." + ) + if output_arch not in compatible_archs: + output_arch = _resolve_supported_ptx_arch(output_arch, compatible_archs)According to NVIDIA CUDA/PTX compatibility documentation, can PTX generated for `compute_80` run on an `sm_75` GPU, or must the PTX target be less than or equal to the device's compute capability?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@warp/_src/context.py` around lines 3858 - 3867, The current PTX selection can return an arch higher than the device SM because _resolve_supported_ptx_arch() is called on the full runtime.nvrtc_supported_archs; update the branch inside get_cuda_output_format() == "ptx" (the block that sets output_arch using warp.config.ptx_target_arch/runtime.default_ptx_arch) to first filter runtime.nvrtc_supported_archs (and any candidate set) to only architectures <= self.arch, then call _resolve_supported_ptx_arch() on that filtered set; if the filtered candidate set is empty, raise/emit a clear error (fail fast) indicating no supported PTX arch <= self.arch so we don't pick an incompatible higher PTX target.
🧹 Nitpick comments (1)
warp/_src/context.py (1)
3961-3963: Use Warp's warning helper instead ofprint().
print()bypasses the repo's warning path and will fire on every repeated resolution.warp._src.utils.warn(..., once=True)keeps this consistent with the rest ofwarp/_src/context.py.♻️ Proposed change
- print( - f"Warning: PTX target arch sm_{target_arch} is not supported by NVRTC; using sm_{resolved} instead" - ) + warp._src.utils.warn( + f"PTX target arch sm_{target_arch} is not supported by NVRTC; using sm_{resolved} instead", + once=True, + )As per coding guidelines, "Use
warp._src.utils.warn()instead ofwarnings.warn()—it routes warnings to stdout as some applications don't want Warp writing to stderr".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@warp/_src/context.py` around lines 3961 - 3963, Replace the direct print(...) call in the PTX target resolution path with Warp's warning helper to avoid bypassing the repo's warning system: locate the print that emits "Warning: PTX target arch sm_{target_arch} is not supported..." in the resolution logic (use symbols target_arch and resolved to find the code) and call warp._src.utils.warn with the same message and once=True (so the message is routed via Warp's warning path and shown only once).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@warp/_src/context.py`:
- Around line 3858-3867: The current PTX selection can return an arch higher
than the device SM because _resolve_supported_ptx_arch() is called on the full
runtime.nvrtc_supported_archs; update the branch inside get_cuda_output_format()
== "ptx" (the block that sets output_arch using
warp.config.ptx_target_arch/runtime.default_ptx_arch) to first filter
runtime.nvrtc_supported_archs (and any candidate set) to only architectures <=
self.arch, then call _resolve_supported_ptx_arch() on that filtered set; if the
filtered candidate set is empty, raise/emit a clear error (fail fast) indicating
no supported PTX arch <= self.arch so we don't pick an incompatible higher PTX
target.
---
Nitpick comments:
In `@warp/_src/context.py`:
- Around line 3961-3963: Replace the direct print(...) call in the PTX target
resolution path with Warp's warning helper to avoid bypassing the repo's warning
system: locate the print that emits "Warning: PTX target arch sm_{target_arch}
is not supported..." in the resolution logic (use symbols target_arch and
resolved to find the code) and call warp._src.utils.warn with the same message
and once=True (so the message is routed via Warp's warning path and shown only
once).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: c64e6c11-52ac-4d2b-8718-804abb3c654f
📒 Files selected for processing (1)
warp/_src/context.py
|
@cluster2600 Okay, this is almost ready to merge. Please fix the pre-commit issue, squash your changes, and rebase with main. Once those are done I can merge. Thanks! |
52d189b to
e55f12e
Compare
|
@shi-eric Done — I've squashed everything into a single commit, rebased onto main, and fixed the ruff formatting issue (the Cheers for the thorough reveiw, happy to address anything else if needed. |
default_ptx_arch is hardcoded to 75 in both the device-present and no-device code paths without checking whether the value is actually supported by the available NVRTC. If a future NVRTC release were to drop support for arch 75 this would cause a cryptic compilation failure at runtime. Add a small helper _resolve_supported_ptx_arch() that clamps the target architecture to the closest NVRTC-supported value and use it in both initialisation paths so that default_ptx_arch (and config.ptx_target_arch when set) are always validated. A warning is printed whenever the target is adjusted so users are aware of the change. Also validate per-device PTX targets in get_cuda_compile_arch() so a device arch below the NVRTC-supported range is resolved correctly. Adds unit tests exercising the exact-match, clamp-up, fallback-to-max, single-element, and empty-set edge cases. Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
e55f12e to
30c7c05
Compare
Description
default_ptx_archis hardcoded to75in both the device-present andno-device initialisation paths without verifying that the value is actually
supported by the available NVRTC. The same applies to
config.ptx_target_archwhen the user has set it explicitly.This PR adds a small helper (
_resolve_supported_ptx_arch) that clamps thetarget architecture to the closest NVRTC-supported value — preferring the
lowest supported arch >= the target so that the resulting PTX remains as
broadly forward-compatible as possible. Both initialisation paths now call
this helper so that
default_ptx_archis always validated before it getsused for compilation.
The practical impact today is low (NVRTC has supported arch 75 since
CUDA 10 and Warp requires CUDA 12+), but this hardens the code against
future NVRTC releases that may drop older architectures.
Closes #1218
Checklist
Test plan
python3 -m pytest warp/tests/test_context.py -v— all 5 tests pass(1 pre-existing + 4 new).
_resolve_supported_ptx_arch:nvrtc_supported_archs, returned as-is.supported arch returned.
have NVRTC-supported architectures (the common case). The no-device
path now validates
config.ptx_target_arch/ the75fallback againstthe supported set.
New feature / enhancement
Summary by CodeRabbit
Bug Fixes
Tests