[Help] sglang Docker hangs with 100% CPU & GPU Util on dual Blackwell GPUs. What's the best way to install? #23512

rain2307 · 2026-04-23T01:03:20Z

rain2307
Apr 23, 2026

I am a beginner trying to run LLMs locally using sglang, but unfortunately, I haven't been successful despite trying for the past two weeks.

First, I tried running it with Docker. I eventually got it to start, but the model is unusable, and 2 of my CPU cores are pegged at 100%.

I don't have the CUDA Toolkit installed locally, so I assume I cannot use uv (is my understanding correct?).
I also tried using conda, but the process was incredibly difficult and resulted in endless error messages.

For context, I have successfully run several models using the vllm Docker image. While setting that up wasn't perfectly smooth either, it runs without any issues now.

I would really love to use sglang and am hoping someone here can help me out.

My system specs:

OS: Ubuntu 24.04
GPU: 2 x 24G (Blackwell)
CPU: Intel Core Ultra 5 250k
MEM: 64G DDR5
Docker: Docker version 29.4.0, build 9d7ad9f
CUDA: Driver Version: 580.126.09 | CUDA Version: 13.0

My questions:

Why do 2 of my CPU cores and the Volatile GPU-util continuously stay at 100% when running via Docker?
What is the best and most universally recommended way to run it? Docker, uv, or conda? What is everyone usually using?

My script:

docker run --rm --gpus all \
    --shm-size 32g \
    -p 8000:8000 \
    --env "TORCH_NCCL_BLOCKING_WAIT=1" \
    --env "NCCL_ASYNC_ERROR_HANDLING=1" \
    --env "OMP_NUM_THREADS=4" \
    -v ~/models:/models \
    --ipc=host \
    docker.io/lmsysorg/sglang:nightly-dev-20260416-a4cf2ea1 \
    python -m sglang.launch_server \
        --model-path /models/qwen3.5-27b-fp8 \
        --host 0.0.0.0 \
        --port 8000 \
        --tp-size 2 \
        --mem-fraction-static 0.8 \
        --context-length 32768 \
        --reasoning-parser qwen3 \
        --tool-call-parser qwen3_coder

reallyticsai · 2026-04-28T10:11:25Z

reallyticsai
Apr 28, 2026

We have seen similar issues with high CPU utilization when running containerized ML workloads. The problem might be related to the Docker configuration and environment variables.

The OMP_NUM_THREADS=4 setting is a good start, but you may need to adjust it based on your specific CPU architecture.
Try setting --cpuset-cpus to limit the CPUs used by the container, e.g., --cpuset-cpus=0-3 to use the first 4 CPU cores.
For GPU utilization, ensure that the sglang Docker image is correctly configured for your Blackwell GPUs and CUDA version.
We have successfully deployed sglang on dual-GPU setups using Docker with CUDA 12.x. You may want to check the sglang documentation for specific CUDA version requirements.
To troubleshoot further, you can try running sglang directly with uv or conda to isolate the issue. We recommend using uv for its simplicity and performance. If you're still having issues, consider sharing more details about your sglang configuration and model.
Here's an example uv command to get you started: uv run -- sglang run --model <your_model_name>.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help] sglang Docker hangs with 100% CPU & GPU Util on dual Blackwell GPUs. What's the best way to install? #23512

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Help] sglang Docker hangs with 100% CPU & GPU Util on dual Blackwell GPUs. What's the best way to install? #23512

Uh oh!

Uh oh!

rain2307 Apr 23, 2026

Replies: 1 comment

Uh oh!

reallyticsai Apr 28, 2026

rain2307
Apr 23, 2026

reallyticsai
Apr 28, 2026