Optimize topk sigmoid in minimax_m2 #14047

rogeryoungh · 2025-11-27T07:38:05Z

Motivation

This PR optimizes the topk sigmoid in minimax_m2, using the topk_sigmoid kernel implementation from #13049.

Modifications

Make the TopK to support the scoring_func parameter.

Accuracy Tests

We have validated the correctness of this change on MiniMax-M2, achieving an accuracy of 0.9249 on GSM8K. Previous PR #13049 results was 0.93 on GSM8K and 0.803 on AIME2025. Original AIME2025 score was 0.78.

lm_eval --model local-completions \
    --model_args base_url=http://localhost:8000/v1/completions,tokenizer=/model,model=/model \
    --tasks gsm8k_cot  \
    --batch_size 128 \
    --num_fewshot 5
local-completions (base_url=http://localhost:8000/v1/completions,tokenizer=/model,model=/model), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 128
|  Tasks  |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|---------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_cot|      3|flexible-extract|     5|exact_match|↑  |0.9249|±  |0.0073|
|         |       |strict-match    |     5|exact_match|↑  |0.9121|±  |0.0078|

Benchmarking and Profiling

Original.

+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|    |   max_concurrency |   input_throughput |   output_throughput |   mean_ttft_ms |   median_ttft_ms |   p99_ttft_ms |   mean_tpot_ms |   median_tpot_ms |   p99_tpot_ms |   per_user_throughput |
+====+===================+====================+=====================+================+==================+===============+================+==================+===============+=======================+
|  0 |             1.000 |            120.350 |              77.390 |        175.130 |          171.560 |       387.900 |         12.110 |           12.120 |        12.380 |                77.390 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  1 |            16.000 |            717.960 |             431.630 |        245.770 |          201.520 |       836.530 |         34.810 |           34.560 |        66.070 |                27.020 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  2 |            64.000 |           1156.730 |             764.030 |        344.830 |          267.910 |      1097.370 |         79.960 |           79.750 |       153.430 |                11.938 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+

Optimized.

+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|    |   max_concurrency |   input_throughput |   output_throughput |   mean_ttft_ms |   median_ttft_ms |   p99_ttft_ms |   mean_tpot_ms |   median_tpot_ms |   p99_tpot_ms |   per_user_throughput |
+====+===================+====================+=====================+================+==================+===============+================+==================+===============+=======================+
|  0 |             1.000 |            134.770 |              86.660 |        153.430 |          149.650 |       264.620 |         10.790 |           10.830 |        11.080 |                86.660 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  1 |            16.000 |            792.260 |             476.300 |        220.980 |          178.230 |       843.790 |         31.350 |           31.350 |        56.030 |                29.769 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  2 |            64.000 |           1263.140 |             834.320 |        313.510 |          266.850 |      1007.350 |         72.280 |           72.390 |       128.450 |                13.036 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-11-27T07:38:10Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

BBuf

LGTM.

BBuf · 2025-12-01T17:00:33Z

/tag-and-rerun-ci

Co-authored-by: xuebi <[email protected]>

update: use topk sigmoid in minimax_m2

d19e8dc

rogeryoungh requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Ying1123, ch-wan, ispobock and merrymercy as code owners November 27, 2025 07:38

update: format

f83f936

BBuf approved these changes Dec 1, 2025

View reviewed changes

github-actions bot added the run-ci label Dec 1, 2025

BBuf merged commit 3dabd60 into sgl-project:main Dec 2, 2025
179 of 191 checks passed

harvenstar pushed a commit to harvenstar/sglang that referenced this pull request Dec 4, 2025

Optimize topk sigmoid in minimax_m2 (sgl-project#14047)

37c6e69

Co-authored-by: xuebi <[email protected]>

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025

Optimize topk sigmoid in minimax_m2 (sgl-project#14047)

50177e0

Co-authored-by: xuebi <[email protected]>

yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025

Optimize topk sigmoid in minimax_m2 (sgl-project#14047)

feb9598

Co-authored-by: xuebi <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize topk sigmoid in minimax_m2 #14047

Optimize topk sigmoid in minimax_m2 #14047

Uh oh!

rogeryoungh commented Nov 27, 2025

Uh oh!

gemini-code-assist bot commented Nov 27, 2025

Uh oh!

BBuf left a comment

Uh oh!

BBuf commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimize topk sigmoid in minimax_m2 #14047

Optimize topk sigmoid in minimax_m2 #14047

Uh oh!

Conversation

rogeryoungh commented Nov 27, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Nov 27, 2025

Uh oh!

BBuf left a comment

Choose a reason for hiding this comment

Uh oh!

BBuf commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants