Skip to content

Validate default_ptx_arch against NVRTC supported architectures#1276

Open
cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
cluster2600:fix/validate-default-ptx-arch
Open

Validate default_ptx_arch against NVRTC supported architectures#1276
cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
cluster2600:fix/validate-default-ptx-arch

Conversation

@cluster2600
Copy link

@cluster2600 cluster2600 commented Mar 8, 2026

Description

default_ptx_arch is hardcoded to 75 in both the device-present and
no-device initialisation paths without verifying that the value is actually
supported by the available NVRTC. The same applies to
config.ptx_target_arch when the user has set it explicitly.

This PR adds a small helper (_resolve_supported_ptx_arch) that clamps the
target architecture to the closest NVRTC-supported value — preferring the
lowest supported arch >= the target so that the resulting PTX remains as
broadly forward-compatible as possible. Both initialisation paths now call
this helper so that default_ptx_arch is always validated before it gets
used for compilation.

The practical impact today is low (NVRTC has supported arch 75 since
CUDA 10 and Warp requires CUDA 12+), but this hardens the code against
future NVRTC releases that may drop older architectures.

Closes #1218

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Test plan

  1. Ran python3 -m pytest warp/tests/test_context.py -v — all 5 tests pass
    (1 pre-existing + 4 new).
  2. New unit tests excercise the following scenarios for
    _resolve_supported_ptx_arch:
    • Exact match — target is in nvrtc_supported_archs, returned as-is.
    • Clamp up — target missing, lowest supported arch above target chosen.
    • Fallback to max — all supported archs are below target, highest
      supported arch returned.
    • Single element — only one supported arch availble.
  3. The behaviour of the device-present path is unchanged when all devices
    have NVRTC-supported architectures (the common case). The no-device
    path now validates config.ptx_target_arch / the 75 fallback against
    the supported set.

New feature / enhancement

# _resolve_supported_ptx_arch(target_arch, supported_archs)
# returns target_arch if supported, otherwise the closest alternative
from warp._src.context import _resolve_supported_ptx_arch

_resolve_supported_ptx_arch(75, {70, 80, 86})  # => 80
_resolve_supported_ptx_arch(75, {70, 75, 86})  # => 75
_resolve_supported_ptx_arch(90, {70, 75, 80})  # => 80

Summary by CodeRabbit

  • Bug Fixes

    • Improved PTX architecture selection so the app consistently chooses an NVRTC-supported PTX arch across environments (including offline or no-device scenarios). When a requested arch isn't supported the system now selects the nearest supported architecture and logs a fallback.
  • Tests

    • Added unit tests covering exact-match, upward-clamping, fallback-to-max, single-entry behavior, and empty-set error handling for the PTX arch selection logic.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link

coderabbitai bot commented Mar 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a module-level helper to select an NVRTC-supported PTX architecture and applies it to validate/compute PTX arch in both CUDA-present and no-device NVRTC code paths; adds unit tests covering match, clamp-up, fallback, single-element, and empty-set error cases.

Changes

Cohort / File(s) Summary
PTX Architecture Resolution
warp/_src/context.py
Adds _resolve_supported_ptx_arch(target_arch: int, supported_archs: set[int]) -> int and integrates it to validate/override selected PTX architectures in CUDA and no-device NVRTC code paths; logs a warning when falling back.
Unit Tests
warp/tests/test_context.py
Adds tests for _resolve_supported_ptx_arch: exact match, clamp-up (smallest >= target), fallback to max (target above all), single-element behavior, and empty-set raising ValueError.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding validation of default_ptx_arch against NVRTC supported architectures.
Linked Issues check ✅ Passed The PR implements the core requirement from #1218: adding _resolve_supported_ptx_arch helper that validates/clamps PTX target arch against NVRTC-supported values, with validation in both device-present and no-device initialization paths.
Out of Scope Changes check ✅ Passed All changes are directly related to validating PTX arch against NVRTC support: new helper function, validation in compilation path, tests covering the new functionality, and no extraneous modifications.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@warp/_src/context.py`:
- Around line 5333-5341: The current clamp only updates runtime.default_ptx_arch
and can let an unsupported per-device PTX target slip through; update the final
PTX-target selection in Device.get_cuda_compile_arch() (which computes
min(self.arch, warp.config.ptx_target_arch or runtime.default_ptx_arch)) to
revalidate the chosen target against self.nvrtc_supported_archs using
_resolve_supported_ptx_arch (or equivalent) and ensure the returned target is
NVRTC-supported for that device; if no supported PTX exists for the device,
raise a clear, descriptive error indicating the device arch and the available
nvrtc_supported_archs so compilation will fail fast.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 09d275e4-b3d2-4717-a9d8-03d4da24e2db

📥 Commits

Reviewing files that changed from the base of the PR and between 8bfcaf8 and 6883172.

📒 Files selected for processing (2)
  • warp/_src/context.py
  • warp/tests/test_context.py

@greptile-apps
Copy link

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This PR introduces _resolve_supported_ptx_arch, a helper that validates a desired PTX architecture against the set of NVRTC-supported architectures, preferring the lowest supported arch ≥ the target for forward compatibility and falling back to the highest supported arch when none qualify. Both the device-present and no-device initialisation paths in Runtime.__init__ now call this helper, and Device.get_cuda_output_arch() applies an additional per-device validation pass. Five new unit tests cover all documented branches including the empty-set guard.

  • New helper _resolve_supported_ptx_arch correctly handles exact-match, clamp-up, fallback-to-max, single-element, and empty-set cases with an explicit ValueError guard.
  • Warning output uses print() instead of warp.utils.warn(), which bypasses warp.config.quiet and the deduplication mechanism; this warning fires once per compilation in the device path and could spam output.
  • Test coverage is thorough and matches every case described in the PR description.

Confidence Score: 4/5

  • Safe to merge with a minor fix to the warning mechanism.
  • The logic is correct for the common case (exact match or clamp-up) and the empty-set guard prevents silent failures. The only actionable issue is the use of print() instead of warp.utils.warn(), which could cause repeated output during kernel compilation but does not affect correctness.
  • warp/_src/context.py (line 3961) — warning output mechanism.

Important Files Changed

Filename Overview
warp/_src/context.py Adds _resolve_supported_ptx_arch helper and integrates it into both the device-present and no-device init paths; warning output uses print() instead of warp.utils.warn(), which bypasses quiet mode and deduplication.
warp/tests/test_context.py Adds five well-structured unit tests covering exact-match, clamp-up, fallback-to-max, single-element, and empty-set error cases for _resolve_supported_ptx_arch.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Runtime.__init__] --> B{CUDA devices present?}
    B -- Yes --> C[default_ptx_arch = 75]
    C --> D[min arch of eligible devices\nin nvrtc_supported_archs]
    D -- success --> E[default_ptx_arch = min result]
    D -- ValueError --> F[retain default_ptx_arch = 75]
    E --> G{default_ptx_arch\nin nvrtc_supported_archs?}
    F --> G
    G -- No --> H[_resolve_supported_ptx_arch\ndefault_ptx_arch, nvrtc_supported_archs]
    G -- Yes --> I[default_ptx_arch finalised]
    H --> I

    B -- No --> J{nvrtc_supported_archs\nnon-empty?}
    J -- Yes --> K[target = ptx_target_arch or 75]
    K --> L[_resolve_supported_ptx_arch\ntarget, nvrtc_supported_archs]
    L --> I
    J -- No --> M[default_ptx_arch = None]

    I --> N[Device.get_cuda_output_arch]
    N --> O[output_arch = min device.arch, default_ptx_arch]
    O --> P{output_arch\nin nvrtc_supported_archs?}
    P -- No --> Q[_resolve_supported_ptx_arch\noutput_arch, nvrtc_supported_archs]
    P -- Yes --> R[use output_arch for compilation]
    Q --> R

    subgraph _resolve_supported_ptx_arch
        S{supported_archs empty?} -- Yes --> T[raise ValueError]
        S -- No --> U{target_arch\nin supported_archs?}
        U -- Yes --> V[return target_arch]
        U -- No --> W[above = sorted archs >= target]
        W --> X{above non-empty?}
        X -- Yes --> Y[return above_0 lowest >= target]
        X -- No --> Z[return max supported_archs]
    end
Loading

Comments Outside Diff (1)

  1. warp/_src/context.py, line 3961 (link)

    Use warp.utils.warn() instead of print() for warning output

    The warning is emitted via print(), which bypasses Warp's existing warning infrastructure. This has two practical consequences:

    1. Doesn't respect warp.config.quiet — the message will appear even when the user has suppressed all Warp output.
    2. No deduplicationget_cuda_output_arch() is called on every kernel compilation. If a device's arch is persistently unsupported, this message will be printed repeatedly for every kernel that gets compiled, spamming the output.

    warp.utils.warn() already handles both concerns (warnings_seen deduplication, and it routes through Python's warnings module rather than stdout). It's also used in nearby code (e.g., lines 3723, 3748) for a consistent user experience.

    Note: since _resolve_supported_ptx_arch is a module-level function called before the runtime is fully initialised, verify that warp.utils is importable at that call site — it is for the existing usages in context.py.

Last reviewed commit: 30c7c05

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
warp/_src/context.py (1)

3865-3867: ⚠️ Potential issue | 🟠 Major

Reject PTX targets that exceed the device architecture.

This still allows output_arch to clamp upward past self.arch. For example, an sm_75 device with nvrtc_supported_archs={80, 86} now resolves to 80, which is not runnable on that device. This should fail fast instead of returning an invalid PTX target.

Suggested fix
-            if runtime.nvrtc_supported_archs and output_arch not in runtime.nvrtc_supported_archs:
-                output_arch = _resolve_supported_ptx_arch(output_arch, runtime.nvrtc_supported_archs)
+            if runtime.nvrtc_supported_archs:
+                runnable_archs = {arch for arch in runtime.nvrtc_supported_archs if arch <= self.arch}
+                if not runnable_archs:
+                    raise RuntimeError(
+                        f"No NVRTC-supported PTX target can run on device {self.alias} "
+                        f"(sm_{self.arch}); supported PTX targets are {sorted(runtime.nvrtc_supported_archs)}."
+                    )
+                if output_arch not in runnable_archs:
+                    output_arch = _resolve_supported_ptx_arch(output_arch, runnable_archs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/_src/context.py` around lines 3865 - 3867, The current logic may clamp
output_arch upward to a PTX target not supported by the physical device (e.g.,
device with self.arch == 75 gets resolved to 80 via
_resolve_supported_ptx_arch); instead, after resolving a supported NVRTC arch
via _resolve_supported_ptx_arch (when runtime.nvrtc_supported_archs is set),
verify that the resolved arch does not exceed the device architecture (compare
numeric SM values from output_arch/self.arch) and if it does, raise an explicit
error (e.g., ValueError or RuntimeError) rather than returning an invalid PTX
target; update the branch that references runtime.nvrtc_supported_archs,
output_arch, and _resolve_supported_ptx_arch to perform this check and raise on
mismatch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@warp/_src/context.py`:
- Around line 3946-3953: Update the docstring to use the repo's RST-style
double-backtick formatting for code and parameter references: replace italicized
occurrences of target_arch and supported_archs with ``target_arch`` and
``supported_archs`` (also change any other italicized terms like *NVRTC* or
inline code to ``NVRTC`` or appropriate ``...``), keeping the existing text and
semantics (the description about returning the closest supported arch and
raising ValueError remains unchanged).

---

Duplicate comments:
In `@warp/_src/context.py`:
- Around line 3865-3867: The current logic may clamp output_arch upward to a PTX
target not supported by the physical device (e.g., device with self.arch == 75
gets resolved to 80 via _resolve_supported_ptx_arch); instead, after resolving a
supported NVRTC arch via _resolve_supported_ptx_arch (when
runtime.nvrtc_supported_archs is set), verify that the resolved arch does not
exceed the device architecture (compare numeric SM values from
output_arch/self.arch) and if it does, raise an explicit error (e.g., ValueError
or RuntimeError) rather than returning an invalid PTX target; update the branch
that references runtime.nvrtc_supported_archs, output_arch, and
_resolve_supported_ptx_arch to perform this check and raise on mismatch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: ac0ce7b2-a16c-41bd-9052-5ef39e32319c

📥 Commits

Reviewing files that changed from the base of the PR and between 6883172 and 4e7972c.

📒 Files selected for processing (2)
  • warp/_src/context.py
  • warp/tests/test_context.py

Comment on lines +3946 to +3953
"""Return *target_arch* if NVRTC supports it, otherwise the closest supported arch.

Preference is given to the lowest supported architecture that is >=
*target_arch* so that the resulting PTX is as broadly forward-compatible
as possible. If no such architecture exists the highest supported value
is returned instead.

*supported_archs* must be non-empty; a ``ValueError`` is raised otherwise.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use repo docstring formatting for parameter references.

The new helper docstring uses italics for target_arch and supported_archs; this repo’s docstring convention requires double backticks for code elements and parameter cross-references.

Suggested fix
-    """Return *target_arch* if NVRTC supports it, otherwise the closest supported arch.
+    """Return ``target_arch`` if NVRTC supports it, otherwise the closest supported arch.
 
     Preference is given to the lowest supported architecture that is >=
-    *target_arch* so that the resulting PTX is as broadly forward-compatible
+    ``target_arch`` so that the resulting PTX is as broadly forward-compatible
     as possible.  If no such architecture exists the highest supported value
     is returned instead.
 
-    *supported_archs* must be non-empty; a ``ValueError`` is raised otherwise.
+    ``supported_archs`` must be non-empty; a ``ValueError`` is raised otherwise.
     """

As per coding guidelines, "Use double backticks for code elements in docstrings (RST syntax)" and "Use double backticks for parameter cross-references in docstrings."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/_src/context.py` around lines 3946 - 3953, Update the docstring to use
the repo's RST-style double-backtick formatting for code and parameter
references: replace italicized occurrences of target_arch and supported_archs
with ``target_arch`` and ``supported_archs`` (also change any other italicized
terms like *NVRTC* or inline code to ``NVRTC`` or appropriate ``...``), keeping
the existing text and semantics (the description about returning the closest
supported arch and raising ValueError remains unchanged).

Comment on lines +3866 to +3867
if runtime.nvrtc_supported_archs and output_arch not in runtime.nvrtc_supported_archs:
output_arch = _resolve_supported_ptx_arch(output_arch, runtime.nvrtc_supported_archs)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the device-specific PTX output path, the resolved architecture can exceed the device's capability, producing incompatible PTX.

At lines 3861/3863, output_arch is already capped at self.arch via min(). However, _resolve_supported_ptx_arch uses a "clamp up" strategy — it returns the lowest supported arch >= target. In scenarios where NVRTC doesn't support self.arch but does support higher architectures (e.g., device is sm_75, NVRTC supports {80, 86}), the function returns 80, which exceeds the device's capability. PTX compiled for sm_80 cannot run on an sm_75 device, causing a runtime failure when the kernel is loaded.

For the device-specific path, prefer the highest supported arch ≤ self.arch as the first choice:

if runtime.nvrtc_supported_archs and output_arch not in runtime.nvrtc_supported_archs:
    # Prefer highest supported arch <= self.arch (device-compatible fallback)
    below = [a for a in runtime.nvrtc_supported_archs if a <= self.arch]
    if below:
        output_arch = max(below)
    else:
        # No supported arch <= device arch; fall back to clamp-up for best-effort
        output_arch = _resolve_supported_ptx_arch(output_arch, runtime.nvrtc_supported_archs)

This ensures the resolved arch never exceeds the device's supported architecture.

@cluster2600 cluster2600 force-pushed the fix/validate-default-ptx-arch branch from 3a19703 to fafa510 Compare March 9, 2026 15:11
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
warp/_src/context.py (2)

3865-3867: ⚠️ Potential issue | 🟠 Major

Don't clamp PTX targets past the device architecture.

With an sm_75 device and runtime.nvrtc_supported_archs={80, 86}, Line 3867 resolves output_arch to 80. That PTX target is newer than the GPU, so the later JIT/load can still fail. Revalidate against the subset of NVRTC-supported arches that are <= self.arch, and raise a clear error if that subset is empty instead of clamping upward.

Proposed fix
-            # Ensure the chosen PTX arch is actually supported by NVRTC
-            if runtime.nvrtc_supported_archs and output_arch not in runtime.nvrtc_supported_archs:
-                output_arch = _resolve_supported_ptx_arch(output_arch, runtime.nvrtc_supported_archs)
+            # Ensure the chosen PTX arch is actually supported by NVRTC and runnable on this device.
+            if runtime.nvrtc_supported_archs:
+                device_supported_archs = {arch for arch in runtime.nvrtc_supported_archs if arch <= self.arch}
+                if not device_supported_archs:
+                    raise RuntimeError(
+                        f"No NVRTC-supported PTX target can run on device {self.alias} (sm_{self.arch}); "
+                        f"available NVRTC targets: {sorted(runtime.nvrtc_supported_archs)}"
+                    )
+                if output_arch not in device_supported_archs:
+                    output_arch = _resolve_supported_ptx_arch(output_arch, device_supported_archs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/_src/context.py` around lines 3865 - 3867, The current clamping logic in
the block referencing runtime.nvrtc_supported_archs and output_arch calls
_resolve_supported_ptx_arch which can select a newer PTX (e.g., choosing 80 for
an sm_75 device); instead restrict the candidate NVRTC arches to those <=
self.arch (i.e., filter runtime.nvrtc_supported_archs by numeric arch <=
self.arch) and if that filtered set is empty raise a clear ValueError (or
RuntimeError) describing no compatible NVRTC-supported PTX for this device,
otherwise call _resolve_supported_ptx_arch with the filtered set to pick a
supported target instead of clamping upward.

3946-3953: ⚠️ Potential issue | 🟡 Minor

Use repo docstring formatting for parameter references.

This helper docstring still uses italics for target_arch and supported_archs; the repo convention here is double-backtick literals.

Suggested fix
-    """Return *target_arch* if NVRTC supports it, otherwise the closest supported arch.
+    """Return ``target_arch`` if NVRTC supports it, otherwise the closest supported arch.
 
     Preference is given to the lowest supported architecture that is >=
-    *target_arch* so that the resulting PTX is as broadly forward-compatible
+    ``target_arch`` so that the resulting PTX is as broadly forward-compatible
     as possible.  If no such architecture exists the highest supported value
     is returned instead.
 
-    *supported_archs* must be non-empty; a ``ValueError`` is raised otherwise.
+    ``supported_archs`` must be non-empty; a ``ValueError`` is raised otherwise.
     """

As per coding guidelines, "Use double backticks for code elements in docstrings (RST syntax)" and "Use double backticks for parameter cross-references in docstrings."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/_src/context.py` around lines 3946 - 3953, Update the helper function
docstring that describes returning a supported NVRTC architecture so it uses RST
double-backtick literals instead of italics: replace instances of target_arch
and supported_archs formatted with single asterisks/italics with ``target_arch``
and ``supported_archs`` (and any other code-like names such as NVRTC) to follow
the repo docstring convention and parameter cross-reference style.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@warp/_src/context.py`:
- Around line 3865-3867: The current clamping logic in the block referencing
runtime.nvrtc_supported_archs and output_arch calls _resolve_supported_ptx_arch
which can select a newer PTX (e.g., choosing 80 for an sm_75 device); instead
restrict the candidate NVRTC arches to those <= self.arch (i.e., filter
runtime.nvrtc_supported_archs by numeric arch <= self.arch) and if that filtered
set is empty raise a clear ValueError (or RuntimeError) describing no compatible
NVRTC-supported PTX for this device, otherwise call _resolve_supported_ptx_arch
with the filtered set to pick a supported target instead of clamping upward.
- Around line 3946-3953: Update the helper function docstring that describes
returning a supported NVRTC architecture so it uses RST double-backtick literals
instead of italics: replace instances of target_arch and supported_archs
formatted with single asterisks/italics with ``target_arch`` and
``supported_archs`` (and any other code-like names such as NVRTC) to follow the
repo docstring convention and parameter cross-reference style.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 0f5abec7-7765-44c8-9e5a-59af9233085a

📥 Commits

Reviewing files that changed from the base of the PR and between 3a19703 and fafa510.

📒 Files selected for processing (2)
  • warp/_src/context.py
  • warp/tests/test_context.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • warp/tests/test_context.py

Copy link
Contributor

@shi-eric shi-eric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, @cluster2600!

Suggestion: Consider adding a warning inside _resolve_supported_ptx_arch when the arch gets adjusted, since the alternative (NVRTC compilation failure) is always worse:

resolved = above[0] if above else max(supported_archs)
print(
    f"Warning: PTX target arch sm_{target_arch} is not supported by NVRTC; using sm_{resolved} instead"
)
return resolved

Without this, users who hit the fallback path have no indication that their target arch was silently changed, which could be confusing if they're debugging arch-specific behavior.

Edit: Maybe the Warp warn() function is not appropriate to call here, edited with print

@cluster2600
Copy link
Author

Thanks for the review @shi-eric! Great suggestion — added the warning in 52d189b. Users will now see a clear message when the PTX target arch is adjusted, e.g.:

Warning: PTX target arch sm_75 is not supported by NVRTC; using sm_80 instead

Much better than a silent fallback.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
warp/_src/context.py (1)

3858-3867: ⚠️ Potential issue | 🟠 Major

Keep the resolved PTX target at or below the device SM.

This still has the same remaining edge case as the earlier review thread: _resolve_supported_ptx_arch() can raise output_arch above self.arch. For an sm_75 device with runtime.nvrtc_supported_archs == {80, 86}, this resolves to sm_80, which is not a valid PTX target for that device. Filter the candidate set to architectures <= self.arch here, and fail fast if that filtered set is empty.

🔧 Proposed fix
         if self.get_cuda_output_format() == "ptx":
             # use the default PTX arch if the device supports it
             if warp.config.ptx_target_arch is not None:
                 output_arch = min(self.arch, warp.config.ptx_target_arch)
             else:
                 output_arch = min(self.arch, runtime.default_ptx_arch)
 
-            # Ensure the chosen PTX arch is actually supported by NVRTC
-            if runtime.nvrtc_supported_archs and output_arch not in runtime.nvrtc_supported_archs:
-                output_arch = _resolve_supported_ptx_arch(output_arch, runtime.nvrtc_supported_archs)
+            # Ensure the chosen PTX arch is supported by NVRTC and can run on this device.
+            if runtime.nvrtc_supported_archs:
+                compatible_archs = {arch for arch in runtime.nvrtc_supported_archs if arch <= self.arch}
+                if not compatible_archs:
+                    raise RuntimeError(
+                        f"Device {self.alias} (sm_{self.arch}) has no compatible PTX target in "
+                        f"{sorted(runtime.nvrtc_supported_archs)}."
+                    )
+                if output_arch not in compatible_archs:
+                    output_arch = _resolve_supported_ptx_arch(output_arch, compatible_archs)
According to NVIDIA CUDA/PTX compatibility documentation, can PTX generated for `compute_80` run on an `sm_75` GPU, or must the PTX target be less than or equal to the device's compute capability?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/_src/context.py` around lines 3858 - 3867, The current PTX selection can
return an arch higher than the device SM because _resolve_supported_ptx_arch()
is called on the full runtime.nvrtc_supported_archs; update the branch inside
get_cuda_output_format() == "ptx" (the block that sets output_arch using
warp.config.ptx_target_arch/runtime.default_ptx_arch) to first filter
runtime.nvrtc_supported_archs (and any candidate set) to only architectures <=
self.arch, then call _resolve_supported_ptx_arch() on that filtered set; if the
filtered candidate set is empty, raise/emit a clear error (fail fast) indicating
no supported PTX arch <= self.arch so we don't pick an incompatible higher PTX
target.
🧹 Nitpick comments (1)
warp/_src/context.py (1)

3961-3963: Use Warp's warning helper instead of print().

print() bypasses the repo's warning path and will fire on every repeated resolution. warp._src.utils.warn(..., once=True) keeps this consistent with the rest of warp/_src/context.py.

♻️ Proposed change
-    print(
-        f"Warning: PTX target arch sm_{target_arch} is not supported by NVRTC; using sm_{resolved} instead"
-    )
+    warp._src.utils.warn(
+        f"PTX target arch sm_{target_arch} is not supported by NVRTC; using sm_{resolved} instead",
+        once=True,
+    )

As per coding guidelines, "Use warp._src.utils.warn() instead of warnings.warn()—it routes warnings to stdout as some applications don't want Warp writing to stderr".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/_src/context.py` around lines 3961 - 3963, Replace the direct print(...)
call in the PTX target resolution path with Warp's warning helper to avoid
bypassing the repo's warning system: locate the print that emits "Warning: PTX
target arch sm_{target_arch} is not supported..." in the resolution logic (use
symbols target_arch and resolved to find the code) and call warp._src.utils.warn
with the same message and once=True (so the message is routed via Warp's warning
path and shown only once).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@warp/_src/context.py`:
- Around line 3858-3867: The current PTX selection can return an arch higher
than the device SM because _resolve_supported_ptx_arch() is called on the full
runtime.nvrtc_supported_archs; update the branch inside get_cuda_output_format()
== "ptx" (the block that sets output_arch using
warp.config.ptx_target_arch/runtime.default_ptx_arch) to first filter
runtime.nvrtc_supported_archs (and any candidate set) to only architectures <=
self.arch, then call _resolve_supported_ptx_arch() on that filtered set; if the
filtered candidate set is empty, raise/emit a clear error (fail fast) indicating
no supported PTX arch <= self.arch so we don't pick an incompatible higher PTX
target.

---

Nitpick comments:
In `@warp/_src/context.py`:
- Around line 3961-3963: Replace the direct print(...) call in the PTX target
resolution path with Warp's warning helper to avoid bypassing the repo's warning
system: locate the print that emits "Warning: PTX target arch sm_{target_arch}
is not supported..." in the resolution logic (use symbols target_arch and
resolved to find the code) and call warp._src.utils.warn with the same message
and once=True (so the message is routed via Warp's warning path and shown only
once).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: c64e6c11-52ac-4d2b-8718-804abb3c654f

📥 Commits

Reviewing files that changed from the base of the PR and between fafa510 and 52d189b.

📒 Files selected for processing (1)
  • warp/_src/context.py

@shi-eric
Copy link
Contributor

@cluster2600 Okay, this is almost ready to merge. Please fix the pre-commit issue, squash your changes, and rebase with main. Once those are done I can merge. Thanks!

@cluster2600 cluster2600 force-pushed the fix/validate-default-ptx-arch branch from 52d189b to e55f12e Compare March 11, 2026 09:38
@cluster2600
Copy link
Author

@shi-eric Done — I've squashed everything into a single commit, rebased onto main, and fixed the ruff formatting issue (the print() call in _resolve_supported_ptx_arch was being split across multiple lines unnecessarily). Pre-commit should be happy now.

Cheers for the thorough reveiw, happy to address anything else if needed.

default_ptx_arch is hardcoded to 75 in both the device-present and
no-device code paths without checking whether the value is actually
supported by the available NVRTC.  If a future NVRTC release were to
drop support for arch 75 this would cause a cryptic compilation
failure at runtime.

Add a small helper _resolve_supported_ptx_arch() that clamps the
target architecture to the closest NVRTC-supported value and use it
in both initialisation paths so that default_ptx_arch (and
config.ptx_target_arch when set) are always validated.  A warning is
printed whenever the target is adjusted so users are aware of the
change.

Also validate per-device PTX targets in get_cuda_compile_arch() so a
device arch below the NVRTC-supported range is resolved correctly.

Adds unit tests exercising the exact-match, clamp-up, fallback-to-max,
single-element, and empty-set edge cases.

Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
@cluster2600 cluster2600 force-pushed the fix/validate-default-ptx-arch branch from e55f12e to 30c7c05 Compare March 11, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate default_ptx_arch against NVRTC supported architectures

2 participants