Add Logprobs unit test with a loose threshold by PrinsYin · Pull Request #10230 · sgl-project/sglang

PrinsYin · 2025-09-09T13:24:26Z

Add Logprobs Unittest

Motivation

We want to merge #6318, so this PR adds a dedicated unittest suite to verify that the changes do not break logprobs computation and interfaces.
The main goal is to catch indexing/corner-case issues and ensure stability/accuracy of logprobs against baseline data.

Modifications

Added test/srt/test_logprobs.py
- Implements:
  - TestLogprobsDense: runs against DEFAULT_SMALL_MODEL_NAME_FOR_TEST
- Configurable batch sizes, sample counts, and temperatures
- Randomly samples records per run for coverage
- Compares input_top_logprobs and output_top_logprobs against baselines
- Validates both max diff and mean diff against thresholds
- Writes per-config results into $GITHUB_STEP_SUMMARY
Added to run_suite.py so it runs in CI
Added retry logic for downloading Hugging Face baselines
Enabled selective return_logprob in batches to check partial-request cases
Cleaned up unsafe access and added tolerance-based failure reporting

Baseline Data

WE ARE NOT TESTING MOE in THIS PR, the data below is just for the reference

Version: v0.5.1.post3
Dense baseline: sglang_baseline_2000.pkl
MoE baseline: sglang_baseline_moe.pkl
Generated ~1000 samples with v0.5.1.post3, uploaded to Hugging Face
Each test run samples a few hundred records based on config
Prompt length controlled to ~2000 tokens

Tolerance Settings

Dense (CUDA):

Max diff ≤ 1.5
Mean diff ≤ 0.1

Dense (ROCm):

Max diff ≤ 1.4
Mean diff ≤ 0.1

MoE (NVIDIA):

Max diff ≤ 10
Mean diff ≤ 0.3

MoE (ROCm):

Max diff ≤ 9.0
Mean diff ≤ 0.5

GitHub Summary Reporting

Each run appends a section like:

yushengsu-thu · 2025-09-10T07:20:51Z

Add AMD baseline and thresholds

Dense:
Max diff ≤ 1.4
Mean diff ≤ 0.1 (per sample)
MoE:
Max diff ≤ 9.0
Mean diff ≤ 0.5 (per sample)

zhaochenyang20 · 2025-09-12T02:38:29Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a comprehensive unittest suite for log probability correctness in SGLang, which is a great addition for ensuring model stability. The tests cover both dense and MoE models, with configurable parameters and comparisons against baseline data. My review focuses on improving code quality, robustness, and maintainability. I've identified some critical issues related to unsafe data access that could cause tests to crash, and I've also suggested refactoring to reduce significant code duplication between the test classes. Additionally, there are some medium-severity suggestions to improve exception handling and reporting consistency.

test/srt/test_logprobs.py

gemini-code-assist · 2025-09-12T03:06:33Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

PrinsYin · 2025-09-12T03:08:30Z

@narutolhy @zhaochenyang20 can you take a moment to review this? thanks!

test/srt/run_suite.py

narutolhy · 2025-09-12T03:42:21Z

test/srt/test_logprobs.py

+    },
+]
+
+os.environ["RETURN_ORIGINAL_LOGPROB"] = "True"


Why do we need to turn on this switch? After turning it on, only the original logprobs will be returned. The different temperatures added in TEST_CONFIGS will not affect the returned logprobs.

I think we enable this so that logprobs stay independent of temperature and we can purely test consistency across different batch/ratio cases in this test? @zhaochenyang20

Well. I think we do not need 4 group of tests. We only need one group of it. num_samples 1000, ratio 0.5.

I feel that if this is added, there is no need to compare the effects of different temperatures, because different temperatures do not affect the return value.

test/srt/test_logprobs.py

zhaochenyang20 · 2025-09-13T04:13:20Z

Tested on H200 for 5 times:

max Δ=0.624243
mean Δ=0.0032411
logprobs returned for 500 samples (expected: 500)
.
max Δ=0.624243
mean Δ=0.00346705
logprobs returned for 500 samples (expected: 500)
.
max Δ=0.624243
mean Δ=0.00395164
logprobs returned for 500 samples (expected: 500)

max Δ=0.624243
mean Δ=0.00399972
logprobs returned for 500 samples (expected: 500)

  loop = asyncio.get_event_loop()
max Δ=0.499887
mean Δ=0.00326239
logprobs returned for 500 samples (expected: 500)

Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Ryan <ryan@ryanmini.mynetworksettings.com>

PrinsYin and others added 6 commits September 9, 2025 02:25

add test script

883a91c

Merge branch 'sgl-project:main' into logprobs

10e4edf

deleted generate

20fd382

added github summary, changed to prompt with 200 average len

c9dd1a8

1

bd4d4ba

Merge branch 'main' into logprobs

c70105e

PrinsYin marked this pull request as ready for review September 10, 2025 03:26

PrinsYin and others added 2 commits September 10, 2025 04:23

1

d4cf08f

add AMD CI

3f6562d

PrinsYin and others added 4 commits September 10, 2025 13:42

fix model name

5d462b0

Merge branch 'main' into logprobs

8785103

Merge branch 'main' into logprobs

e6423cc

added only ask for logprobs in some of the requests

a015def

1

8f7328b

PrinsYin force-pushed the logprobs branch from e88983f to 8f7328b Compare September 12, 2025 02:39

PrinsYin and others added 2 commits September 11, 2025 22:39

Merge branch 'main' into logprobs

1d6d13d

fix lint

d7aa5d7

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

PrinsYin added 2 commits September 12, 2025 02:54

added to run suite

89650bf

cleanup

5f7546c

narutolhy reviewed Sep 12, 2025

View reviewed changes

Hecate0821 reviewed Sep 12, 2025

View reviewed changes

test/srt/test_logprobs.py Outdated Show resolved Hide resolved

test/srt/test_logprobs.py Outdated Show resolved Hide resolved

cleanup

209b330

PrinsYin force-pushed the logprobs branch from 3b1aeab to 209b330 Compare September 12, 2025 13:22

PrinsYin requested review from Hecate0821 and narutolhy September 12, 2025 13:24

zhaochenyang20 and others added 7 commits September 12, 2025 16:05

Update test_logprobs.py

8ccc321

Update test_logprobs.py

3996faa

Update test_logprobs.py

b16fe4e

1

01d571d

Merge branch 'main' into logprobs

e75c092

threashold, return original

5ae3701

Merge branch 'main' into logprobs

3c8d2da

PrinsYin requested review from zhaochenyang20 September 13, 2025 00:16

zhaochenyang20 approved these changes Sep 13, 2025

View reviewed changes

test/srt/test_logprobs.py Outdated Show resolved Hide resolved

zhaochenyang20 requested changes Sep 13, 2025

View reviewed changes

test/srt/test_logprobs.py Outdated Show resolved Hide resolved

Ryan added 2 commits September 12, 2025 23:26

naming

fe67652

1

a6c0c8b

PrinsYin force-pushed the logprobs branch from b25317e to a6c0c8b Compare September 13, 2025 03:40

Merge branch 'main' into logprobs

2537ae4

zhaochenyang20 changed the title ~~[WIP] Add Logprobs unittest~~ Add Logprobs unit test with a loose threshold Sep 13, 2025

PrinsYin and others added 4 commits September 13, 2025 04:15

1

fbf987f

Merge branch 'main' into logprobs

595da84

precommit

bd6ed55

Merge branch 'main' into logprobs

2f3f267

merrymercy approved these changes Sep 15, 2025

View reviewed changes

zhaochenyang20 approved these changes Sep 15, 2025

View reviewed changes

zhaochenyang20 added 3 commits September 14, 2025 21:12

Update test_logprobs.py

9c2f74b

Merge branch 'main' into logprobs

334c812

Update run_suite.py

20f3291

hnyls2002 merged commit f1c692f into sgl-project:main Sep 16, 2025
35 of 39 checks passed

aftersnow mentioned this pull request Sep 16, 2025

feat: limit peak memory usage when computing logprobs #6318

Merged

6 tasks

Conversation

PrinsYin commented Sep 9, 2025 • edited by zhaochenyang20 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Logprobs Unittest

Motivation

Modifications

Baseline Data

Tolerance Settings

GitHub Summary Reporting

Uh oh!

yushengsu-thu commented Sep 10, 2025

Uh oh!

zhaochenyang20 commented Sep 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot commented Sep 12, 2025

Uh oh!

PrinsYin commented Sep 12, 2025

Uh oh!

Uh oh!

narutolhy Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

PrinsYin Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

narutolhy Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 commented Sep 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Comments

PrinsYin commented Sep 9, 2025 •

edited by zhaochenyang20

Loading