-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
I used TBO, but in timeline no overlap!!?? (2 H100) (Maybe... need more GPU?)(Not sure whether my command is incorrect...?)
Reproduction
PYTHONUNBUFFERED=1 SGLANG_TORCH_PROFILER_DIR=/root python -m sglang.launch_server \
--model-path /root/model_config/DeepSeek-V3 --load-format dummy \
--tp 4 --dp 4 --enable-dp-attention --disable-cuda-graph \
--moe-a2a-backend deepep --chunked-prefill-size 65536 \
--host 0.0.0.0 --port 32123 --decode-log-interval 1 \
--enable-two-batch-overlap
python3 -m sglang.bench_serving --backend sglang --host 127.0.0.1 --port 32123 \
--dataset-name random --num-prompt 10 --random-input 10 \
--random-output 1 --random-range-ratio 1 --max-concurrency 1024 --profile
Environment
(sglang_env) root@zyhuang2-0:~/zyhuang/temp_can/sglang/test/srt# python3 -m sglang.check_env
Python: 3.10.19 | packaged by conda-forge | (main, Oct 22 2025, 22:29:10) [GCC 14.3.0]
CUDA available: True
GPU 0,1: NVIDIA H100 80GB HBM3
GPU 0,1 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda-12.4
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 550.54.15
PyTorch: 2.8.0+cu128
sglang: 0.5.4.post3
sgl_kernel: 0.3.16.post4
flashinfer_python: 0.5.0
flashinfer_cubin: 0.5.0
flashinfer_jit_cache: Module Not Found
triton: 3.4.0
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.2.6
aiohttp: 3.13.2
fastapi: 0.121.0
hf_transfer: 0.1.9
huggingface_hub: 0.36.0
interegular: 0.3.3
modelscope: 1.31.0
orjson: 3.11.4
outlines: 0.1.11
packaging: 25.0
psutil: 7.1.3
pydantic: 2.12.4
python-multipart: 0.0.20
pyzmq: 27.1.0
uvicorn: 0.38.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.25
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.72.0
litellm: Module Not Found
decord2: 2.0.0
NVIDIA Topology:
GPU0 GPU1 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 SYS SYS SYS SYS SYS SYS SYS 48-95,144-191 1 N/A
GPU1 NV18 X SYS SYS SYS SYS SYS SYS PIX 48-95,144-191 1 N/A
NIC0 SYS SYS X SYS SYS SYS SYS SYS SYS
NIC1 SYS SYS SYS X SYS SYS SYS SYS SYS
NIC2 SYS SYS SYS SYS X PXB PXB SYS SYS
NIC3 SYS SYS SYS SYS PXB X PIX SYS SYS
NIC4 SYS SYS SYS SYS PXB PIX X SYS SYS
NIC5 SYS SYS SYS SYS SYS SYS SYS X SYS
NIC6 SYS PIX SYS SYS SYS SYS SYS SYS X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
NIC6: mlx5_6
ulimit soft: 1048576
1763222210.2774773-TP-1.trace.json.gz
1763222210.2774773-TP-2.trace.json.gz
1763222210.2774773-TP-3.trace.json.gz
1763222210.2774773-TP-0.trace.json.gz