Add nightly performance test for GPT-OSS 4GPU models by alisonshao · Pull Request #12805 · sgl-project/sglang

alisonshao · 2025-11-07T03:22:39Z

Create test_nightly_gpt_oss_4gpu_perf.py with profiling support
Test bf16 and mxfp4 quantizations of gpt-oss-120b
Add to nightly-4-gpu-b200 suite

gemini-code-assist · 2025-11-07T03:22:53Z

Summary of Changes

Hello @alisonshao, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the testing infrastructure by adding a dedicated nightly performance test for GPT-OSS 4GPU models. This new test focuses on evaluating the performance of gpt-oss-120b using both bf16 and mxfp4 quantizations, ensuring continuous performance monitoring within the nightly-4-gpu-b200 suite. Additionally, it includes a refactoring of existing hicache tests to streamline the test suite.

Highlights

New Nightly Performance Test: Introduced test_nightly_gpt_oss_4gpu_perf.py to benchmark GPT-OSS 4GPU models.
Model Quantization Testing: The new test specifically evaluates gpt-oss-120b with bf16 and mxfp4 quantizations.
Integration into Nightly Suite: The performance test has been added to the nightly-4-gpu-b200 test suite for continuous monitoring.
Hicache Test Refactoring: Several hicache related tests were consolidated into a single test_hicache_variants.py for improved organization.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new nightly performance test for GPT-OSS 4GPU models and refactors some test suites. The new test file is well-structured, but I've identified a few areas for improvement related to test configuration, filename generation, and code style. Additionally, a change in run_suite.py appears to unintentionally leave a test file untracked, which would likely cause a CI failure. My review provides specific suggestions to address these points.

gemini-code-assist · 2025-11-07T03:23:57Z

test/srt/run_suite.py

@@ -236,7 +235,6 @@ class TestFile:
        TestFile("ep/test_moe_deepep_eval_accuracy_large.py"),
        TestFile("function_call/test_unknown_tool_name.py"),
        TestFile("hicache/test_disaggregation_hicache.py"),
-        TestFile("hicache/test_hicache_page.py"),


The test file test_hicache_page.py has been removed from the __not_in_ci__ suite, but the file itself has not been deleted from the repository. The sanity check in run_suite.py will likely fail because this file is now untracked by the test runner. If this file is still needed, it should be added back to a test suite (e.g., __not_in_ci__) to resolve the sanity check failure. Otherwise, it should be deleted.

gemini-code-assist · 2025-11-07T03:23:57Z

test/srt/test_nightly_gpt_oss_4gpu_perf.py

+            ),
+        ]
+        cls.base_url = DEFAULT_URL_FOR_TEST
+        cls.batch_sizes = [1, 1, 8, 16, 64]


The batch_sizes list contains a duplicate value 1. This will cause the benchmark for batch size 1 to run twice, which is redundant and consumes unnecessary CI resources. Please remove the duplicate entry.

Suggested change

cls.batch_sizes = [1, 1, 8, 16, 64]

cls.batch_sizes = [1, 8, 16, 64]

gemini-code-assist · 2025-11-07T03:23:58Z

test/srt/test_nightly_gpt_oss_4gpu_perf.py

+                    profile_filename = (
+                        f"{model_path.replace('/', '_')}_{int(time.time())}"
+                    )
+                    profile_path_prefix = os.path.join(PROFILE_DIR, profile_filename)
+                    json_output_file = f"results_{model_path.replace('/', '_')}_{int(time.time())}.json"


The timestamp from int(time.time()) is generated separately for profile_filename and json_output_file. If these calls happen to cross a second boundary, the filenames will have different timestamps, which could be confusing. It's better to generate a single timestamp and reuse it for both filenames to ensure consistency. This also provides an opportunity to reduce code duplication by creating a slug for the model name.

Suggested change

profile_filename = (

f"{model_path.replace('/', '_')}_{int(time.time())}"

)

profile_path_prefix = os.path.join(PROFILE_DIR, profile_filename)

json_output_file = f"results_{model_path.replace('/', '_')}_{int(time.time())}.json"

timestamp = int(time.time())

model_name_slug = model_path.replace('/', '_')

profile_filename = f"{model_name_slug}_{timestamp}"

profile_path_prefix = os.path.join(PROFILE_DIR, profile_filename)

json_output_file = f"results_{model_name_slug}_{timestamp}.json"

gemini-code-assist · 2025-11-07T03:23:58Z

test/srt/test_nightly_gpt_oss_4gpu_perf.py

+
+                    # Load and deserialize JSON results
+                    if os.path.exists(json_output_file):
+                        import json


The import json statement is located inside a method. According to the PEP 8 style guide, imports should be placed at the top of the file. This improves readability and makes it easier to see the file's dependencies at a glance. Please move this import to the top of the file with the other imports.

- Create test_nightly_gpt_oss_4gpu_perf.py with profiling support - Test bf16 and mxfp4 quantizations of gpt-oss-120b - Add to nightly-4-gpu-b200 suite

…suite

sglang-bot added the run-ci label Nov 7, 2025

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

Add nightly performance test for GPT-OSS 4GPU models

ff2a7b5

- Create test_nightly_gpt_oss_4gpu_perf.py with profiling support - Test bf16 and mxfp4 quantizations of gpt-oss-120b - Add to nightly-4-gpu-b200 suite

alisonshao force-pushed the add-gpt-oss-4gpu-perf-test branch from a9b18d0 to ff2a7b5 Compare November 7, 2025 03:31

Fridge003 mentioned this pull request Nov 7, 2025

Add GPT OSS 120b to nightly test #12797

Closed

4 tasks

alisonshao and others added 3 commits November 7, 2025 14:13

Temporarily disable test_fp4_moe.py due to pytest dependency issue

5c3df53

Add pytest to CI dependencies and restore test_fp4_moe.py to nightly …

a9e8587

…suite

Merge branch 'main' into add-gpt-oss-4gpu-perf-test

da33713

Fridge003 approved these changes Nov 8, 2025

View reviewed changes

Fridge003 merged commit 0b88d52 into main Nov 8, 2025
62 of 78 checks passed

Fridge003 deleted the add-gpt-oss-4gpu-perf-test branch November 8, 2025 00:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nightly performance test for GPT-OSS 4GPU models#12805

Add nightly performance test for GPT-OSS 4GPU models#12805
Fridge003 merged 4 commits intomainfrom
add-gpt-oss-4gpu-perf-test

alisonshao commented Nov 7, 2025

Uh oh!

gemini-code-assist bot commented Nov 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 7, 2025

Uh oh!

gemini-code-assist bot Nov 7, 2025

Uh oh!

gemini-code-assist bot Nov 7, 2025

Uh oh!

gemini-code-assist bot Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

	cls.batch_sizes = [1, 1, 8, 16, 64]
	cls.batch_sizes = [1, 8, 16, 64]

Conversation

alisonshao commented Nov 7, 2025

Uh oh!

gemini-code-assist bot commented Nov 7, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments