Replies: 1 comment
-
|
We have seen similar issues with high CPU utilization when running containerized ML workloads. The problem might be related to the Docker configuration and environment variables.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am a beginner trying to run LLMs locally using
sglang, but unfortunately, I haven't been successful despite trying for the past two weeks.First, I tried running it with Docker. I eventually got it to start, but the model is unusable, and 2 of my CPU cores are pegged at 100%.
I don't have the CUDA Toolkit installed locally, so I assume I cannot use
uv(is my understanding correct?).I also tried using
conda, but the process was incredibly difficult and resulted in endless error messages.For context, I have successfully run several models using the
vllmDocker image. While setting that up wasn't perfectly smooth either, it runs without any issues now.I would really love to use
sglangand am hoping someone here can help me out.My system specs:
My questions:
uv, orconda? What is everyone usually using?My script:
docker run --rm --gpus all \ --shm-size 32g \ -p 8000:8000 \ --env "TORCH_NCCL_BLOCKING_WAIT=1" \ --env "NCCL_ASYNC_ERROR_HANDLING=1" \ --env "OMP_NUM_THREADS=4" \ -v ~/models:/models \ --ipc=host \ docker.io/lmsysorg/sglang:nightly-dev-20260416-a4cf2ea1 \ python -m sglang.launch_server \ --model-path /models/qwen3.5-27b-fp8 \ --host 0.0.0.0 \ --port 8000 \ --tp-size 2 \ --mem-fraction-static 0.8 \ --context-length 32768 \ --reasoning-parser qwen3 \ --tool-call-parser qwen3_coderBeta Was this translation helpful? Give feedback.
All reactions