Skip to content

Comments

[NVIDIA] FA3/FA4 Fix #11606

Merged
zhyncs merged 48 commits intosgl-project:mainfrom
johnnynunez:patch-4
Oct 20, 2025
Merged

[NVIDIA] FA3/FA4 Fix #11606
zhyncs merged 48 commits intosgl-project:mainfrom
johnnynunez:patch-4

Conversation

@johnnynunez
Copy link
Contributor

@johnnynunez johnnynunez commented Oct 14, 2025

Motivation

Fix FA3/FA4 with latest changes

@johnnynunez johnnynunez mentioned this pull request Oct 14, 2025
@johnnynunez johnnynunez marked this pull request as ready for review October 14, 2025 11:52
@johnnynunez johnnynunez changed the title [WIP] FA4 Fix [NVIDIA] FA4 Fix Oct 14, 2025
@johnnynunez johnnynunez changed the title [NVIDIA] FA4 Fix [NVIDIA] FA4 Fix (Under Testing) Oct 14, 2025
@johnnynunez johnnynunez changed the title [NVIDIA] FA4 Fix (Under Testing) [NVIDIA] FA3/FA4 Fix (Under Testing) Oct 14, 2025
@ishandhanani ishandhanani changed the title [NVIDIA] FA3/FA4 Fix (Under Testing) [NVIDIA] FA3/FA4 Fix Oct 15, 2025
@johnnynunez
Copy link
Contributor Author

@johnnynunez
Copy link
Contributor Author

@Fridge003 Fridge003 mentioned this pull request Oct 17, 2025
4 tasks
@Fridge003
Copy link
Collaborator

Fridge003 commented Oct 17, 2025

Result of sgl-kernel/tests/test_flash_attention_4.py on B200:

/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
============================= test session starts ==============================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /sgl-workspace/sglang/sgl-kernel
configfile: pyproject.toml
plugins: typeguard-4.4.4, anyio-4.11.0
collected 1968 items

sgl-kernel/tests/test_flash_attention_4.py ............................. [  1%]
........................................................................ [  5%]
........................................................................ [  8%]
.........................ssssssssssssssssss......ssssssssssssssssss..... [ 12%]
.ssssssssssssssssss......ssssssssssssssssss............................. [ 16%]
........................................................................ [ 19%]
........................................................................ [ 23%]
........................................................................ [ 27%]
.................................................ssssssssssssssssss..... [ 30%]
.ssssssssssssssssss......ssssssssssssssssss......ssssssssssssssssss..... [ 34%]
.ssssssssssssssssss......ssssssssssssssssss............................. [ 38%]
........................................................................ [ 41%]
.ssssssssssssssssss......ssssssssssssssssss............................. [ 45%]
........................................................................ [ 49%]
........................................................................ [ 52%]
........................................................................ [ 56%]
........................................................................ [ 60%]
........................................................................ [ 63%]
........................................................................ [ 67%]
........................................................................ [ 70%]
........................................................................ [ 74%]
........................................................................ [ 78%]
........................................................................ [ 81%]
........................................................................ [ 85%]
........................................................................ [ 89%]
.......................................................ssssssssssss..... [ 92%]
...............................ssssssssssss............................. [ 96%]
.......ssssssssssss....................................ssssssssssss      [100%]

=============================== warnings summary ===============================
tests/test_flash_attention_4.py: 51414 warnings
  /usr/local/lib/python3.12/dist-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/_mlir_helpers/op.py:31: DeprecationWarning: cute.arch.exp2 is deprecated, use cute.math.exp2 with `fastmath=True` instead
    res_or_list = opFunc(*args, **kwargs, loc=loc)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======== 1704 passed, 264 skipped, 51414 warnings in 713.71s (0:11:53) =========

@Fridge003
Copy link
Collaborator

Result of test/srt/test_flash_attention_4.py on B200:

...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 64.57it/s]
Accuracy: 0.730
Invalid: 0.010
Latency: 1.587 s
Output throughput: 4754.717 token/s
{'accuracy': np.float64(0.73), 'invalid': np.float64(0.01), 'latency': 1.587055522017181, 'output_throughput': 4754.717081610904}
.
----------------------------------------------------------------------
Ran 1 test in 62.719s

OK

@johnnynunez
Copy link
Contributor Author

@zhyncs @Fridge003 is this ready to merge now?

TestFile("test_disaggregation_pp.py", 140),
],
"per-commit-4-gpu-b200": [
# TestFile("test_flash_attention_4.py"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should enable the fa4 unit test on b200

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add it back after sgl-kernel bumps.
This test can pass locally #11606 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhyncs zhyncs merged commit 252dc4e into sgl-project:main Oct 20, 2025
60 of 70 checks passed
@b8zhong b8zhong mentioned this pull request Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants