fix radix cache match by hnyls2002 · Pull Request #7 · sgl-project/sglang

hnyls2002 · 2024-01-15T17:33:59Z

No description provided.

…TP size (sgl-project#7) * support the case where num_attention_heads can't be divided evenly by tp_size * refactor * move cpu specific logic to cpu_utils.py * only set padded weights to zero

npu rotary_embedding replace

…roject#7) This reverts commit eac4599.

* improve bfloat16 gemm performance for prefilling before: ``` gemm_bf16(native): 4.772 ms, gemm_fp8(opt): 0.000 ms, gemm_int8(opt): 0.000 ms, gemm_bf16(opt): 15.328 ms ``` after: ``` gemm_bf16(native): 4.847 ms, gemm_fp8(opt): 0.000 ms, gemm_int8(opt): 0.000 ms, gemm_bf16(opt): 3.927 ms ``` * apply brgemm * improve int8 gemm performance for prefilling * apply brgemm to moe: part1 * apply brgemm to moe: part2 --------- Co-authored-by: mingfeima <mingfei.ma@intel.com>

* Revert "port prefill optimization (sgl-project#7)" This reverts commit ea0d028. * improve bfloat16 gemm performance for prefilling before: ``` gemm_bf16(native): 4.772 ms, gemm_fp8(opt): 0.000 ms, gemm_int8(opt): 0.000 ms, gemm_bf16(opt): 15.328 ms ``` after: ``` gemm_bf16(native): 4.847 ms, gemm_fp8(opt): 0.000 ms, gemm_int8(opt): 0.000 ms, gemm_bf16(opt): 3.927 ms ``` * improve fp8 gemm performance with large M * enable amx-int8 for gemm, fused moe, shared moe and qkv_proj kernels on PyTorch 2.7 * improve int8 gemm performance with large M * improve bf16 and int8 moe performance with large nbatches * update naming for nb0 and nb1 in fused gemm and silu_mul kernel * improve fp8 moe performance with large nbatches * remove hardcode numbers --------- Co-authored-by: mingfeima <mingfei.ma@intel.com>

add install instructions for B200

… setting profiler env. (sgl-project#7) Co-authored-by: svc_repro_tool <svc_repro_tool@habana.ai> Co-authored-by: Polisetty V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>

Signed-off-by: Stanley Winata <stanley.winata@amd.com> [Wave] Add wave extend attention kernel Signed-off-by: Harsh Menon <harsh@nod-labs.com> [Wave] Adding logit_cap and layer scaling to API Also add support for the wave backend to the model runner. And use Triton decode kernels for now. [Wave] Run chunked prefill for perf comparison on Wave test Need to rename the non chunked/regular prefill version because otherwise rpd will treat it as the same kernel Signed-off-by: Stanley Winata <stanley.winata@amd.com> [Wave] Cache the function that loads the wave kernel Also maintain a global kernel hash to avoid recomputing the hash on every call. [Wave] Don't specify block size and enable buffer ops [Wave] Enable wave runtime and update scheduling API [Wave] Update API to use wave_compile & WaveCompileOptions [Wave] Update wave backend and extend attention to latest [Wave] Add speculative decode kernel Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com> cache kernels using lru_cache Update WaveBackend to use Wave Decode (sgl-project#6) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Revert "Update WaveBackend to use Wave Decode (sgl-project#6)" (sgl-project#7) This reverts commit eac4599. Wave Backend decode (sgl-project#8) * align shapes Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> * fix Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Wave backend fixes (sgl-project#10) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> More fixes to Wave decode (sgl-project#12) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> is_causal Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Enable the grok in3 model (sgl-project#14) Set unique cache dir for each worker (sgl-project#16) update kernel (sgl-project#18) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> updated spec decode test as per wave Signed-off-by: xintin <gaurav.verma@amd.com> fix extend (sgl-project#23) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Refactor paged decode intermediate arrays shapes (sgl-project#24) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> remove dyn symbols (sgl-project#26) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> cleanup shapes (sgl-project#27) Some fields were removed from `paged_decode_attention_shape`. Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Remove `mha` param from Wave decode attention kernel (sgl-project#28) Depends on iree-org/iree-turbine#1039 Signed-off-by: Paul Zhang <paul.zhang@amd.com> nfc: fix problems reported by linting update references from iree.turbine to wave_lang

[FIX] fix fuse share expert in EP

# This is the 1st commit message: rebase # This is the commit message sgl-project#2: remove duplicated code # This is the commit message sgl-project#3: add type hints # This is the commit message sgl-project#4: add clear cache for benchmark alignment # This is the commit message sgl-project#5: remove unuse arg # This is the commit message sgl-project#6: clear cache once # This is the commit message sgl-project#7: simplified VAE cache logic for qwenimage and wan # This is the commit message sgl-project#8: remove duplicated code

…hunk Support graph chunk

fixed radix cache match bug

4d1f45e

merrymercy merged commit 01ca82d into main Jan 15, 2024

merrymercy deleted the ls-fix branch January 15, 2024 17:42

wonderisland mentioned this pull request Sep 19, 2024

[Bug] illegal memory access encountered #1467

Closed

5 tasks

CSEEduanyu mentioned this pull request Jan 26, 2025

[Bug] NCCL Crash with SIGSEGV Frequently when deploying deepseek v3 #2803

Closed

5 tasks

lambert0312 mentioned this pull request Feb 18, 2025

Support NextN (MTP) speculative decoding for DeepSeek-V3/R1 #3582

Merged

ToughK mentioned this pull request Feb 18, 2025

[Bug] sglang crashed when use enable_dp_attention running DeepSeekV3 on 2x8xH100 #3658

Closed

5 tasks

mahaocong90 mentioned this pull request Feb 26, 2025

[Bug] H20 8 gpu x 2 with --enable-dp-attention occurred CUDA error: an illegal memory access #3892

Closed

5 tasks

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

fix radix cache match (sgl-project#7)

8cc6f3a

This was referenced Apr 16, 2025

enable ci test: upstream ci for XPU DiweiSun/sglang#4

Closed

Enable CPU CI: upstream CI enabling with github workflow DiweiSun/sglang#3

Closed

Update README.md DiweiSun/sglang#1

Closed

riou-chen mentioned this pull request Apr 17, 2025

[Bug] run eagle3 failed #5448

Closed

dongyibo mentioned this pull request May 19, 2025

[Bug] eagle2【CUDA error: an illegal memory access was encountered】 #6309

Closed

5 tasks

yuleiqin mentioned this pull request May 26, 2025

[Bug] main pd version Exception: Failed to encode tensor map: 700 #6590

Closed

5 tasks

zhuyijie88 pushed a commit to zhuyijie88/sglang that referenced this pull request Jul 17, 2025

Merge pull request sgl-project#7 from wangqian108/main

7bca503

npu rotary_embedding replace

pkking pushed a commit to pkking/sglang1 that referenced this pull request Jul 23, 2025

[Bugfix] Fix W8A8_Int8 weight loader (sgl-project#7)

ba1bfb9

lingyaoluu mentioned this pull request Jul 25, 2025

[Bug] MTP CUDA error: an illegal memory access was encountered #8336

Closed

5 tasks

yichiche pushed a commit to yichiche/sglang that referenced this pull request Jul 30, 2025

Revert "Update WaveBackend to use Wave Decode (sgl-project#6)" (sgl-p…

e5ccda0

…roject#7) This reverts commit eac4599.

yichiche pushed a commit to yichiche/sglang that referenced this pull request Aug 7, 2025

Revert "Update WaveBackend to use Wave Decode (sgl-project#6)" (sgl-p…

281a961

…roject#7) This reverts commit eac4599.

yichiche pushed a commit to yichiche/sglang that referenced this pull request Aug 11, 2025

Revert "Update WaveBackend to use Wave Decode (sgl-project#6)" (sgl-p…

3333572

…roject#7) This reverts commit eac4599.

ericschreiber mentioned this pull request Aug 13, 2025

[Bug] CUDA error: uncorrectable ECC error encountered when using HiCache with xPyD disaggregation. #9151

Closed

5 tasks

gaolaobao mentioned this pull request Aug 25, 2025

[Bug] RTX 5060: RMSNorm failed, same as the #7249 issue, when running qwen2.5-0.5b-instruct model. #9600

Closed

5 tasks

Johnsonms mentioned this pull request Oct 2, 2025

Support DeepSeek V3.2 Exp #11061

Merged

someoneexistsontheinternet pushed a commit to someoneexistsontheinternet/sglang that referenced this pull request Oct 23, 2025

Merge pull request sgl-project#7 from NousResearch/b200-fixes

92a8bae

add install instructions for B200

Johnsonms mentioned this pull request Nov 10, 2025

[Bug] DeepSeek V32 CUDA error: an illegal memory access was encountered #12893

Closed

5 tasks

0xymoro mentioned this pull request Nov 10, 2025

[Bug] 0.5.5 custom all reduce crashing #13016

Closed

5 tasks

RahulB200 mentioned this pull request Nov 13, 2025

[Bug] Kimi K2 Thinking Marlin Kernel Crash #13234

Closed

5 tasks

apinge pushed a commit to apinge/sglang that referenced this pull request Nov 26, 2025

Merge pull request sgl-project#7 from ZLkanyo009/dev/perf

781f337

[FIX] fix fuse share expert in EP

Copilot AI mentioned this pull request Dec 1, 2025

Add distributed parallelism parameters (tp_size, pp_size, dp_size) to throughput parameter generator keliangli/sglang#4

Merged

6 tasks

yhyang201 pushed a commit that referenced this pull request Dec 13, 2025

feat: add health/health_generate (#7)

eaaf840

tpoisonooo pushed a commit to tpoisonooo/sglang that referenced this pull request Feb 12, 2026

Merge pull request sgl-project#7 from GitHubstart0916/support_graph_c…

e9fdf6c

…hunk Support graph chunk

Martion-z mentioned this pull request Feb 13, 2026

[Bug] CUDA error: an illegal memory access was encountered with SGLang v0.5.8 + HiCache #18785

Open

5 tasks

chenkaiyue mentioned this pull request Feb 28, 2026

Fix: Cuda Graph + HiCache + Speculative Decoding Working Together were giving Cuda Illegal memory access error. #19177

Open

alisonshao mentioned this pull request Mar 1, 2026

Upgrade transformers==5.2.0 #17784

Open

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix radix cache match#7

fix radix cache match#7
merrymercy merged 1 commit intomainfrom
ls-fix

hnyls2002 commented Jan 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hnyls2002 commented Jan 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants