Skip to content

Add nightly performance test for GPT-OSS 4GPU models#12805

Merged
Fridge003 merged 4 commits intomainfrom
add-gpt-oss-4gpu-perf-test
Nov 8, 2025
Merged

Add nightly performance test for GPT-OSS 4GPU models#12805
Fridge003 merged 4 commits intomainfrom
add-gpt-oss-4gpu-perf-test

Conversation

@alisonshao
Copy link
Collaborator

  • Create test_nightly_gpt_oss_4gpu_perf.py with profiling support
  • Test bf16 and mxfp4 quantizations of gpt-oss-120b
  • Add to nightly-4-gpu-b200 suite

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alisonshao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the testing infrastructure by adding a dedicated nightly performance test for GPT-OSS 4GPU models. This new test focuses on evaluating the performance of gpt-oss-120b using both bf16 and mxfp4 quantizations, ensuring continuous performance monitoring within the nightly-4-gpu-b200 suite. Additionally, it includes a refactoring of existing hicache tests to streamline the test suite.

Highlights

  • New Nightly Performance Test: Introduced test_nightly_gpt_oss_4gpu_perf.py to benchmark GPT-OSS 4GPU models.
  • Model Quantization Testing: The new test specifically evaluates gpt-oss-120b with bf16 and mxfp4 quantizations.
  • Integration into Nightly Suite: The performance test has been added to the nightly-4-gpu-b200 test suite for continuous monitoring.
  • Hicache Test Refactoring: Several hicache related tests were consolidated into a single test_hicache_variants.py for improved organization.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new nightly performance test for GPT-OSS 4GPU models and refactors some test suites. The new test file is well-structured, but I've identified a few areas for improvement related to test configuration, filename generation, and code style. Additionally, a change in run_suite.py appears to unintentionally leave a test file untracked, which would likely cause a CI failure. My review provides specific suggestions to address these points.

@@ -236,7 +235,6 @@ class TestFile:
TestFile("ep/test_moe_deepep_eval_accuracy_large.py"),
TestFile("function_call/test_unknown_tool_name.py"),
TestFile("hicache/test_disaggregation_hicache.py"),
TestFile("hicache/test_hicache_page.py"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The test file test_hicache_page.py has been removed from the __not_in_ci__ suite, but the file itself has not been deleted from the repository. The sanity check in run_suite.py will likely fail because this file is now untracked by the test runner. If this file is still needed, it should be added back to a test suite (e.g., __not_in_ci__) to resolve the sanity check failure. Otherwise, it should be deleted.

),
]
cls.base_url = DEFAULT_URL_FOR_TEST
cls.batch_sizes = [1, 1, 8, 16, 64]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The batch_sizes list contains a duplicate value 1. This will cause the benchmark for batch size 1 to run twice, which is redundant and consumes unnecessary CI resources. Please remove the duplicate entry.

Suggested change
cls.batch_sizes = [1, 1, 8, 16, 64]
cls.batch_sizes = [1, 8, 16, 64]

Comment on lines +57 to +61
profile_filename = (
f"{model_path.replace('/', '_')}_{int(time.time())}"
)
profile_path_prefix = os.path.join(PROFILE_DIR, profile_filename)
json_output_file = f"results_{model_path.replace('/', '_')}_{int(time.time())}.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The timestamp from int(time.time()) is generated separately for profile_filename and json_output_file. If these calls happen to cross a second boundary, the filenames will have different timestamps, which could be confusing. It's better to generate a single timestamp and reuse it for both filenames to ensure consistency. This also provides an opportunity to reduce code duplication by creating a slug for the model name.

Suggested change
profile_filename = (
f"{model_path.replace('/', '_')}_{int(time.time())}"
)
profile_path_prefix = os.path.join(PROFILE_DIR, profile_filename)
json_output_file = f"results_{model_path.replace('/', '_')}_{int(time.time())}.json"
timestamp = int(time.time())
model_name_slug = model_path.replace('/', '_')
profile_filename = f"{model_name_slug}_{timestamp}"
profile_path_prefix = os.path.join(PROFILE_DIR, profile_filename)
json_output_file = f"results_{model_name_slug}_{timestamp}.json"


# Load and deserialize JSON results
if os.path.exists(json_output_file):
import json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import json statement is located inside a method. According to the PEP 8 style guide, imports should be placed at the top of the file. This improves readability and makes it easier to see the file's dependencies at a glance. Please move this import to the top of the file with the other imports.

- Create test_nightly_gpt_oss_4gpu_perf.py with profiling support
- Test bf16 and mxfp4 quantizations of gpt-oss-120b
- Add to nightly-4-gpu-b200 suite
@alisonshao alisonshao force-pushed the add-gpt-oss-4gpu-perf-test branch from a9b18d0 to ff2a7b5 Compare November 7, 2025 03:31
@Fridge003 Fridge003 mentioned this pull request Nov 7, 2025
4 tasks
@Fridge003 Fridge003 merged commit 0b88d52 into main Nov 8, 2025
62 of 78 checks passed
@Fridge003 Fridge003 deleted the add-gpt-oss-4gpu-perf-test branch November 8, 2025 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments