Skip to content

Comments

[sgl-kernel] fix b200 kernel ci#13907

Merged
Fridge003 merged 9 commits intosgl-project:mainfrom
FlamingoPg:b200-ci
Nov 30, 2025
Merged

[sgl-kernel] fix b200 kernel ci#13907
Fridge003 merged 9 commits intosgl-project:mainfrom
FlamingoPg:b200-ci

Conversation

@FlamingoPg
Copy link
Collaborator

Motivation

Fix failed B200 CI, partly adapted from #13731

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Co-authored-by: HydraQYH <qyh820@outlook.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @FlamingoPg, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses and resolves continuous integration (CI) failures on B200 systems by refining the device capability checks within kernel tests. The changes ensure that specific FP8-related kernel operations and FlashMLA tests are only executed on hardware that explicitly supports the required SM90 architecture, thereby preventing failures on incompatible systems and improving the reliability of the CI pipeline.

Highlights

  • Corrected Device Support for FP8 MoE Tests: The test_es_fp8_blockwise_moe.py test now correctly specifies that es_fp8_blockwise_scaled_grouped_mm is exclusively supported on SM90 architectures, removing SM100 from the skip condition.
  • Introduced SM90 Support Utility: A new helper function, is_sm90_supported, has been added to test_flashmla.py to programmatically check for SM90 device capability and a CUDA version of 12.3 or higher.
  • Applied SM90 Requirement to FlashMLA Tests: FP8-related FlashMLA tests, specifically test_flashmla_prefill and test_flash_mla_decode, are now conditionally skipped if the running environment does not support SM90, ensuring these tests only execute on compatible hardware.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix CI failures on B200 (SM100) hardware. The changes involve disabling some tests for FP8 kernels on SM100 architecture by restricting them to SM90, likely as a temporary measure to get the CI pipeline passing. Additionally, a helper function to check for SM90 support has been added to test_flashmla.py, which is a duplicate of an existing function. My feedback focuses on addressing this code duplication to improve maintainability.

Comment on lines +41 to +44
def is_sm90_supported(device=None) -> bool:
return (torch.cuda.get_device_capability(device)[0] == 9) and (
torch.version.cuda >= "12.3"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This is_sm90_supported function is a duplicate of the one in sgl-kernel/tests/test_es_fp8_blockwise_moe.py. To follow the DRY (Don't Repeat Yourself) principle and improve maintainability, this function should be defined in a single, shared location, such as a test utilities file (e.g., sgl-kernel/tests/utils.py or sgl-kernel/tests/conftest.py), and imported into both test modules.

@FlamingoPg FlamingoPg added the format Auto Format Code label Nov 25, 2025
@Fridge003
Copy link
Collaborator

@Fridge003 Fridge003 merged commit 412160f into sgl-project:main Nov 30, 2025
61 of 86 checks passed
harvenstar pushed a commit to harvenstar/sglang that referenced this pull request Dec 4, 2025
Co-authored-by: HydraQYH <qyh820@outlook.com>
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
Co-authored-by: HydraQYH <qyh820@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants