Skip to content

Conversation

@rogeryoungh
Copy link
Contributor

Motivation

This PR optimizes the topk sigmoid in minimax_m2, using the topk_sigmoid kernel implementation from #13049.

Modifications

Make the TopK to support the scoring_func parameter.

Accuracy Tests

We have validated the correctness of this change on MiniMax-M2, achieving an accuracy of 0.9249 on GSM8K. Previous PR #13049 results was 0.93 on GSM8K and 0.803 on AIME2025. Original AIME2025 score was 0.78.

lm_eval --model local-completions \
    --model_args base_url=http://localhost:8000/v1/completions,tokenizer=/model,model=/model \
    --tasks gsm8k_cot  \
    --batch_size 128 \
    --num_fewshot 5
local-completions (base_url=http://localhost:8000/v1/completions,tokenizer=/model,model=/model), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 128
|  Tasks  |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|---------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_cot|      3|flexible-extract|     5|exact_match||0.9249|±  |0.0073|
|         |       |strict-match    |     5|exact_match||0.9121|±  |0.0078|

Benchmarking and Profiling

Original.

+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|    |   max_concurrency |   input_throughput |   output_throughput |   mean_ttft_ms |   median_ttft_ms |   p99_ttft_ms |   mean_tpot_ms |   median_tpot_ms |   p99_tpot_ms |   per_user_throughput |
+====+===================+====================+=====================+================+==================+===============+================+==================+===============+=======================+
|  0 |             1.000 |            120.350 |              77.390 |        175.130 |          171.560 |       387.900 |         12.110 |           12.120 |        12.380 |                77.390 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  1 |            16.000 |            717.960 |             431.630 |        245.770 |          201.520 |       836.530 |         34.810 |           34.560 |        66.070 |                27.020 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  2 |            64.000 |           1156.730 |             764.030 |        344.830 |          267.910 |      1097.370 |         79.960 |           79.750 |       153.430 |                11.938 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+

Optimized.

+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|    |   max_concurrency |   input_throughput |   output_throughput |   mean_ttft_ms |   median_ttft_ms |   p99_ttft_ms |   mean_tpot_ms |   median_tpot_ms |   p99_tpot_ms |   per_user_throughput |
+====+===================+====================+=====================+================+==================+===============+================+==================+===============+=======================+
|  0 |             1.000 |            134.770 |              86.660 |        153.430 |          149.650 |       264.620 |         10.790 |           10.830 |        11.080 |                86.660 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  1 |            16.000 |            792.260 |             476.300 |        220.980 |          178.230 |       843.790 |         31.350 |           31.350 |        56.030 |                29.769 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  2 |            64.000 |           1263.140 |             834.320 |        313.510 |          266.850 |      1007.350 |         72.280 |           72.390 |       128.450 |                13.036 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Collaborator

@BBuf BBuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@BBuf
Copy link
Collaborator

BBuf commented Dec 1, 2025

/tag-and-rerun-ci

@github-actions github-actions bot added the run-ci label Dec 1, 2025
@BBuf BBuf merged commit 3dabd60 into sgl-project:main Dec 2, 2025
179 of 191 checks passed
harvenstar pushed a commit to harvenstar/sglang that referenced this pull request Dec 4, 2025
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants