-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
I was used DGX Spark to deploy sglang and run qwen3-32B. But I find some error.
My envs:
`
python3 -m sglang.check_env
/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/torch/cuda/init.py:283: UserWarning:
Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
Minimum and Maximum cuda capability supported by this version of PyTorch is
(8.0) - (12.0)
warnings.warn(
Python: 3.12.12 | packaged by Anaconda, Inc. | (main, Oct 21 2025, 20:05:09) [GCC 11.2.0]
CUDA available: True
GPU 0: NVIDIA GB10
GPU 0 Compute Capability: 12.1
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 13.0, V13.0.88
CUDA Driver Version: 580.95.05
PyTorch: 2.9.1+cu130
sglang: 0.5.5.post1
sgl_kernel: 0.3.17
flashinfer_python: 0.5.0
flashinfer_cubin: 0.5.0
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.3.4
aiohttp: 3.13.2
fastapi: 0.121.1
hf_transfer: 0.1.9
huggingface_hub: 0.36.0
interegular: 0.3.3
modelscope: 1.31.0
orjson: 3.11.4
outlines: 0.1.11
packaging: 25.0
psutil: 7.1.3
pydantic: 2.12.4
python-multipart: 0.0.20
pyzmq: 27.1.0
uvicorn: 0.38.0
uvloop: 0.21.0
vllm: Module Not Found
xgrammar: 0.1.25
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.72.0
litellm: Module Not Found
decord2: 2.0.0
NVIDIA Topology:
GPU0 NIC0 NIC1 NIC2 NIC3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NODE NODE NODE NODE 0-19 0 N/A
NIC0 NODE X PIX NODE NODE
NIC1 NODE PIX X NODE NODE
NIC2 NODE NODE NODE X PIX
NIC3 NODE NODE NODE PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: rocep1s0f0
NIC1: rocep1s0f1
NIC2: roceP2p1s0f0
NIC3: roceP2p1s0f1
ulimit soft: 1024
`
run command:
python -m sglang.launch_server --model-path models/Qwen3-32B --host 0.0.0.0 --port 30000 --tp-size 1
error:
/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/torch/cuda/init.py:283: UserWarning:
Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
Minimum and Maximum cuda capability supported by this version of PyTorch is
(8.0) - (12.0)
warnings.warn(
[2025-11-13 17:45:45] WARNING common.py:1538: Failed to get GPU memory capacity from nvidia-smi, falling back to torch.cuda.mem_get_info().
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/launch_server.py", line 24, in
server_args = prepare_server_args(sys.argv[1:])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/server_args.py", line 4008, in prepare_server_args
return ServerArgs.from_cli_args(raw_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/server_args.py", line 3616, in from_cli_args
return cls(**{attr: getattr(args, attr) for attr in attrs})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 275, in init
File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/server_args.py", line 588, in post_init
self._handle_gpu_memory_settings(gpu_mem)
File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/server_args.py", line 834, in _handle_gpu_memory_settings
model_config = self.get_model_config()
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/server_args.py", line 3637, in get_model_config
from sglang.srt.configs.model_config import ModelConfig
File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/configs/model_config.py", line 26, in
from sglang.srt.layers.quantization import QUANTIZATION_METHODS
File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/layers/quantization/init.py", line 19, in
from sglang.srt.layers.quantization.auto_round import AutoRoundConfig
File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/layers/quantization/auto_round.py", line 12, in
from sglang.srt.layers.quantization.utils import get_scalar_types
File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/layers/quantization/utils.py", line 13, in
from sglang.srt.layers.quantization.fp8_kernel import scaled_fp8_quant
File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sglang/srt/layers/quantization/fp8_kernel.py", line 46, in
from sgl_kernel import sgl_per_tensor_quant_fp8, sgl_per_token_quant_fp8
File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sgl_kernel/init.py", line 5, in
common_ops = _load_architecture_specific_ops()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sgl_kernel/load_utils.py", line 188, in _load_architecture_specific_ops
raise ImportError(error_msg)
ImportError:
[sgl_kernel] CRITICAL: Could not load any common_ops library!
Attempted locations:
- Architecture-specific pattern: /home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sgl_kernel/sm100/common_ops.* - found files: ['/home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sgl_kernel/sm100/common_ops.abi3.so']
- Fallback pattern: /home/srchen/.conda/envs/sglang_env/lib/python3.12/site-packages/sgl_kernel/common_ops.* - found files: []
- Standard Python import: common_ops - failed
GPU Info:
- Compute capability: 121
- Expected variant: SM121 (precise math for compatibility)
Please ensure sgl_kernel is properly installed with:
pip install --upgrade sgl_kernel
Error details from previous import attempts:
- ImportError: libnvrtc.so.12: cannot open shared object file: No such file or directory
- ModuleNotFoundError: No module named 'common_ops'
Related resources
I found https://pytorch.org/get-started/locally support cuda13.0.
nvidia user:
