-
Notifications
You must be signed in to change notification settings - Fork 846
Description
Describe the bug
When running a Burn model using the CUDA backend on a new NVIDIA RTX 50-series GPU (Blackwell architecture), the application panics during kernel compilation. This behavior presents differently depending on the version but stems from the new hardware architecture not being fully recognized by the compiler targets.
In burn-cubecl 0.20.1, it fails with a PTX JIT compilation error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed")
In burn-cubecl 0.21.0-pre.2, it results in an nvrtc error: nvrtc: error: invalid value for --gpu-architecture (-arch)
To Reproduce
Steps to reproduce the behaviour:
-
Create a basic Rust project using the burn crate with the CUDA backend enabled. -
Run the code on a machine equipped with an NVIDIA RTX 50-series GPU (e.g., RTX 5060) and recent drivers (e.g., CUDA 13.1). -
Observe the panic during the initial kernel compilation phase.
Expected behavior
The CUDA kernels should compile and run successfully on the new hardware, or burn-cubecl should automatically fall back to a supported, older compute capability (like sm_90) without panicking.
Screenshots
Desktop (please complete the following information):
- OS: Linux Mint 22.3 - Cinnamon 64-bit
- Browser N/A
- Version:
NVIDIA Driver: 590.48.01
CUDA: 13.1
burn-cubecl: 0.20.1 and 0.21.0-pre.2
Rust: 1.94.0
Smartphone (please complete the following information):
Additional context
Because the RTX 50 series hardware is new, it appears burn-cubecl is passing an unrecognised architecture flag (likely targeting sm_120) to nvrtc, or generating unsupported PTX in older versions.