[bug fix] fix ima with get_mla_kv_buffer_kernel overflow#14224
[bug fix] fix ima with get_mla_kv_buffer_kernel overflow#14224Fridge003 merged 1 commit intosgl-project:mainfrom
Conversation
Reported-by: ybyang <ybyang7@iflytek.com> Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Summary of ChangesHello @XucSh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a critical illegal memory access error in the get_mla_kv_buffer_kernel Triton kernel, which was caused by an integer overflow during pointer arithmetic. Casting the loc variable to tl.int64 is the correct solution. I appreciate that you also proactively applied the same fix to set_mla_kv_buffer_kernel.
While reviewing, I noticed that set_mla_kv_scale_buffer_kernel in the same file (python/sglang/srt/mem_cache/utils.py at line 112) follows a similar pattern for calculating dst_ptr. It might be vulnerable to the same overflow issue. I recommend applying the same .to(tl.int64) cast to the loc variable in that kernel for consistency and to prevent potential future errors.
|
/tag-and-rerun-ci |
|
Lgtm |
ShangmingCai
left a comment
There was a problem hiding this comment.
@xiezhq-hermann @BBuf @Fridge003 do you have time to check this PR? A bug we find when refactoring PP.
…#14224) Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
…#14224) Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
…#14224) Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
…#14224) Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
…#14224) Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
…#14224) Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Motivation
when test with below command on H20:
server:
SGLANG_PP_LAYER_PARTITION="3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,3" CUDA_LAUNCH_BLOCKING=1 python3 -m sglang.launch_server --model-path /work/models/DeepSeek3.1/ --nnodes 2 --port 30000 --dist-init-addr 26.5.27.241:62001 --node-rank 0 --tp 1 --pp-size 16 --trust-remote-code --disable-radix-cache --mem-fraction-static 0.8 --max-running-requests 512 --chunked-prefill-size 6144 --attention-backend fa3 --watchdog-timeout 3600 --host 0.0.0.0SGLANG_PP_LAYER_PARTITION="3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,3" CUDA_LAUNCH_BLOCKING=1 python3 -m sglang.launch_server --model-path /work/models/DeepSeek3.1/ --nnodes 2 --port 30000 --dist-init-addr 26.5.27.241:62001 --node-rank 1 --tp 1 --pp-size 16 --trust-remote-code --disable-radix-cache --mem-fraction-static 0.8 --max-running-requests 512 --chunked-prefill-size 6144 --attention-backend fa3 --watchdog-timeout 3600client:
python3 -m sglang.bench_serving --host 127.0.0.1 --port 30000 --dataset-path /root/.cache/modelscope/hub/datasets/gliang1001/ShareGPT_V3_unfiltered_cleaned_split/ShareGPT_V3_unfiltered_cleaned_split.json --num-prompt 512 --random-input 13107 --random-output 1 --request-rate 10 --max-concurrency 512 --warmup-requests 0 --backend sglang --dataset-name random --random-range-ratio 1An ima will show up with below stack:
Reported by @whybeyoung
Modifications
Fix it by extend loc to int64.
Checklist