Skip to content

[Bug] illegal memory access with flashmla and tp 16 #5516

@ICENacl

Description

@ICENacl

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

An illegal memory access error occurs when using FlashMLA with tensor parallelism (TP) set to 16. By enabling CUDA_LAUNCH_BLOCKING=1, I was able to pinpoint that the error originates from the FlashMLA kernel.

Additionally, I observed that when using FlashMLA with TP8, setting --cuda-graph-max-bs to a small value (e.g., 9) also triggers the illegal memory access error.

Reproduction

with following command, illegal memory access happend

node 0:
python3 -m sglang.launch_server --model-path <path/to/deepseek-r1> --tp 16 --dist-init-addr :6006 --nnodes 2 --node-rank 0 --trust-remote-code --enable-flashmla --page-size 64 --port 8888 --mem-fraction-static 0.9

node 1:
python3 -m sglang.launch_server --model-path <path/to/deepseek-r1> --tp 16 --dist-init-addr :6006 --nnodes 2 --node-rank 1 --trust-remote-code --enable-flashmla --page-size 64 --port 8888 --mem-fraction-static 0.9

Environment

2 nodes with 8 x h20

Python: 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H20
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.6, V12.6.77
CUDA Driver Version: 535.183.06
PyTorch: 2.5.1+cu124
sglang: 0.4.4.post3
sgl_kernel: 0.0.5.post4
flashinfer: Module Not Found
triton: 3.1.0
transformers: 4.51.0
torchao: 0.10.0
numpy: 1.24.4
aiohttp: 3.10.5
fastapi: 0.115.12
hf_transfer: 0.1.9
huggingface_hub: 0.30.2
interegular: 0.3.3
modelscope: 1.25.0
orjson: 3.10.16
outlines: 0.1.11
packaging: 23.2
psutil: 6.0.0
pydantic: 2.9.2
multipart: Module Not Found
zmq: Module Not Found
uvicorn: 0.34.1
uvloop: 0.21.0
vllm: Module Not Found
xgrammar: 0.1.17
openai: 1.74.0
tiktoken: 0.9.0
anthropic: 0.49.0
litellm: 1.66.1
decord: 0.6.0

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions