[Feature] How to support PyTorch 2.9.1? NVIDIA NVIDIA_DGX_Spark NVIDIA Tegra NVIDIA GB10

### Checklist

- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 2. Please use English, otherwise it will be closed.

### Motivation

I was used DGX Spark to deploy sglang  and run qwen3-32B. But I find some error.

My envs:
 `
python3 -m sglang.check_env
/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/torch/cuda/__init__.py:283: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  warnings.warn(
Python: 3.12.12 | packaged by Anaconda, Inc. | (main, Oct 21 2025, 20:05:09) [GCC 11.2.0]
CUDA available: True
GPU 0: NVIDIA GB10
GPU 0 Compute Capability: 12.1
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 13.0, V13.0.88
CUDA Driver Version: 580.95.05
PyTorch: 2.9.1+cu130
sglang: 0.5.5.post1
sgl_kernel: 0.3.17
flashinfer_python: 0.5.0
flashinfer_cubin: 0.5.0
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.3.4
aiohttp: 3.13.2
fastapi: 0.121.1
hf_transfer: 0.1.9
huggingface_hub: 0.36.0
interegular: 0.3.3
modelscope: 1.31.0
orjson: 3.11.4
outlines: 0.1.11
packaging: 25.0
psutil: 7.1.3
pydantic: 2.12.4
python-multipart: 0.0.20
pyzmq: 27.1.0
uvicorn: 0.38.0
uvloop: 0.21.0
vllm: Module Not Found
xgrammar: 0.1.25
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.72.0
litellm: Module Not Found
decord2: 2.0.0
NVIDIA Topology: 
	GPU0	NIC0	NIC1	NIC2	NIC3	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	NODE	NODE	NODE	NODE	0-19	0		N/A
NIC0	NODE	 X 	PIX	NODE	NODE				
NIC1	NODE	PIX	 X 	NODE	NODE				
NIC2	NODE	NODE	NODE	 X 	PIX				
NIC3	NODE	NODE	NODE	PIX	 X 				

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: rocep1s0f0
  NIC1: rocep1s0f1
  NIC2: roceP2p1s0f0
  NIC3: roceP2p1s0f1


ulimit soft: 1024

`

run command:
`python -m sglang.launch_server     --model-path models/Qwen3-32B     --host 0.0.0.0     --port 30000     --tp-size 1`


error:

/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/torch/cuda/__init__.py:283: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  warnings.warn(
[2025-11-13 17:45:45] WARNING common.py:1538: Failed to get GPU memory capacity from nvidia-smi, falling back to torch.cuda.mem_get_info().
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/launch_server.py", line 24, in <module>
    server_args = prepare_server_args(sys.argv[1:])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/server_args.py", line 4008, in prepare_server_args
    return ServerArgs.from_cli_args(raw_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/server_args.py", line 3616, in from_cli_args
    return cls(**{attr: getattr(args, attr) for attr in attrs})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 275, in __init__
  File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/server_args.py", line 588, in __post_init__
    self._handle_gpu_memory_settings(gpu_mem)
  File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/server_args.py", line 834, in _handle_gpu_memory_settings
    model_config = self.get_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/server_args.py", line 3637, in get_model_config
    from sglang.srt.configs.model_config import ModelConfig
  File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/configs/model_config.py", line 26, in <module>
    from sglang.srt.layers.quantization import QUANTIZATION_METHODS
  File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/layers/quantization/__init__.py", line 19, in <module>
    from sglang.srt.layers.quantization.auto_round import AutoRoundConfig
  File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/layers/quantization/auto_round.py", line 12, in <module>
    from sglang.srt.layers.quantization.utils import get_scalar_types
  File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/layers/quantization/utils.py", line 13, in <module>
    from sglang.srt.layers.quantization.fp8_kernel import scaled_fp8_quant
  File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/layers/quantization/fp8_kernel.py", line 46, in <module>
    from sgl_kernel import sgl_per_tensor_quant_fp8, sgl_per_token_quant_fp8
  File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sgl_kernel/__init__.py", line 5, in <module>
    common_ops = _load_architecture_specific_ops()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sgl_kernel/load_utils.py", line 188, in _load_architecture_specific_ops
    raise ImportError(error_msg)
ImportError: 
[sgl_kernel] CRITICAL: Could not load any common_ops library!

Attempted locations:
1. Architecture-specific pattern: /home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sgl_kernel/sm100/common_ops.* - found files: ['/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sgl_kernel/sm100/common_ops.abi3.so']
2. Fallback pattern: /home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sgl_kernel/common_ops.* - found files: []
3. Standard Python import: common_ops - failed

GPU Info:
- Compute capability: 121
- Expected variant: SM121 (precise math for compatibility)

Please ensure sgl_kernel is properly installed with:
pip install --upgrade sgl_kernel

Error details from previous import attempts:
- ImportError: libnvrtc.so.12: cannot open shared object file: No such file or directory
- ModuleNotFoundError: No module named 'common_ops'




### Related resources

I found  https://pytorch.org/get-started/locally  support cuda13.0.

<img width="1034" height="411" alt="Image" src="https://github.com/user-attachments/assets/c25fdc42-8c16-4c93-bb4c-b5edf26bdf62" />

nvidia user:

<img width="1824" height="673" alt="Image" src="https://github.com/user-attachments/assets/27c674ca-b6dd-4512-9500-d909634a71e1" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] How to support PyTorch 2.9.1? NVIDIA NVIDIA_DGX_Spark NVIDIA Tegra NVIDIA GB10 #13198

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] How to support PyTorch 2.9.1? NVIDIA NVIDIA_DGX_Spark NVIDIA Tegra NVIDIA GB10 #13198

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions