[Feature] Kimi-K2-Thinking Optimization

### Optimization Items

- [x] MoE TopK kernel fusion @BBuf 
    - https://github.com/sgl-project/sglang/pull/13150  @BBuf 
    - https://github.com/sgl-project/sglang/pull/13287 @BBuf 
    - https://github.com/sgl-project/sglang/pull/13332 @BBuf 
    - https://github.com/sgl-project/sglang/pull/13374 @BBuf 
    - https://github.com/sgl-project/sglang/pull/13587 @BBuf 
    - https://github.com/sgl-project/sglang/pull/13596 @BBuf 
- [x] Fix moe tuning bug @BBuf  https://github.com/sgl-project/sglang/pull/13723 https://github.com/sgl-project/sglang/pull/14125 
- [x] Fix IMA in large batch and long seq_length(Sync VLLM marlin moe): @BBuf https://github.com/sgl-project/sglang/pull/13717 https://github.com/sgl-project/sglang/pull/13902 https://github.com/sgl-project/sglang/pull/14122 https://github.com/sgl-project/sglang/pull/14125 
- [x] Opt moe align block size. @BBuf https://github.com/sgl-project/sglang/pull/14133 https://github.com/sgl-project/sglang/pull/14134 
- [x] Optimize reduce sum kernel after MoE  https://github.com/sgl-project/sglang/pull/12888 https://github.com/sgl-project/sglang/pull/14829
- [x] EP support for marlin MoE @BBuf  https://github.com/sgl-project/sglang/pull/13725
- [ ] DeepEP MoE support (all-to-all)  @BBuf https://github.com/sgl-project/sglang/pull/13789 https://github.com/sgl-project/sglang/pull/13787
- [x] Support Piecewise CUDA Graph. @b8zhong https://github.com/sgl-project/sglang/pull/13466 
- [x] Flashinfer TRT-LLM Marlin kernel for SM100 @b8zhong https://github.com/flashinfer-ai/flashinfer/pull/2159 [WIP: depends on Flashinfer bump]

### Related resources

https://huggingface.co/moonshotai/Kimi-K2-Thinking

Profiling command example:
```bash
export SGLANG_TORCH_PROFILER_DIR=/sgl-workspace/sglang/profile/
python -m sglang.launch_server --model-path moonshotai/Kimi-K2-Thinking --tp 8 --trust-remote-code  --tool-call-parser kimi_k2 --reasoning-parser kimi_k2
# bs1
python3 -m sglang.bench_serving --model moonshotai/Kimi-K2-Thinking --dataset-name random --backend sglang-oai --random-range-ratio 1 --random-input-len 1200 --random-output-len 20 --max-concurrency 1 --num-prompts 5 --profile
# bs32
python3 -m sglang.bench_serving --model moonshotai/Kimi-K2-Thinking --dataset-name random --backend sglang-oai --random-range-ratio 1 --random-input-len 1200 --random-output-len 20 --max-concurrency 32 --num-prompts 32 --profile
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Kimi-K2-Thinking Optimization #12882

Optimization Items

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Kimi-K2-Thinking Optimization #12882

Description

Optimization Items

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions