[NVIDIA] FA3/FA4 Fix by johnnynunez · Pull Request #11606 · sgl-project/sglang

johnnynunez · 2025-10-14T10:32:54Z

Motivation

Fix FA3/FA4 with latest changes

…remove unused code

…and refine output validation

…bility, and format code for consistency

johnnynunez · 2025-10-16T22:15:27Z

@johnnynunez @Fridge003 @ishandhanani the unit tests failed https://github.com/sgl-project/sglang/actions/runs/18574101676/job/52960178302?pr=11606

Seems the test itself

johnnynunez · 2025-10-16T23:37:38Z

@johnnynunez @Fridge003 @ishandhanani the unit tests failed https://github.com/sgl-project/sglang/actions/runs/18574101676/job/52960178302?pr=11606

unit test fixed

Fridge003 · 2025-10-17T22:43:55Z

Result of sgl-kernel/tests/test_flash_attention_4.py on B200:

/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
============================= test session starts ==============================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /sgl-workspace/sglang/sgl-kernel
configfile: pyproject.toml
plugins: typeguard-4.4.4, anyio-4.11.0
collected 1968 items

sgl-kernel/tests/test_flash_attention_4.py ............................. [  1%]
........................................................................ [  5%]
........................................................................ [  8%]
.........................ssssssssssssssssss......ssssssssssssssssss..... [ 12%]
.ssssssssssssssssss......ssssssssssssssssss............................. [ 16%]
........................................................................ [ 19%]
........................................................................ [ 23%]
........................................................................ [ 27%]
.................................................ssssssssssssssssss..... [ 30%]
.ssssssssssssssssss......ssssssssssssssssss......ssssssssssssssssss..... [ 34%]
.ssssssssssssssssss......ssssssssssssssssss............................. [ 38%]
........................................................................ [ 41%]
.ssssssssssssssssss......ssssssssssssssssss............................. [ 45%]
........................................................................ [ 49%]
........................................................................ [ 52%]
........................................................................ [ 56%]
........................................................................ [ 60%]
........................................................................ [ 63%]
........................................................................ [ 67%]
........................................................................ [ 70%]
........................................................................ [ 74%]
........................................................................ [ 78%]
........................................................................ [ 81%]
........................................................................ [ 85%]
........................................................................ [ 89%]
.......................................................ssssssssssss..... [ 92%]
...............................ssssssssssss............................. [ 96%]
.......ssssssssssss....................................ssssssssssss      [100%]

=============================== warnings summary ===============================
tests/test_flash_attention_4.py: 51414 warnings
  /usr/local/lib/python3.12/dist-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/_mlir_helpers/op.py:31: DeprecationWarning: cute.arch.exp2 is deprecated, use cute.math.exp2 with `fastmath=True` instead
    res_or_list = opFunc(*args, **kwargs, loc=loc)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======== 1704 passed, 264 skipped, 51414 warnings in 713.71s (0:11:53) =========

Fridge003 · 2025-10-17T22:47:23Z

Result of test/srt/test_flash_attention_4.py on B200:

...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 64.57it/s]
Accuracy: 0.730
Invalid: 0.010
Latency: 1.587 s
Output throughput: 4754.717 token/s
{'accuracy': np.float64(0.73), 'invalid': np.float64(0.01), 'latency': 1.587055522017181, 'output_throughput': 4754.717081610904}
.
----------------------------------------------------------------------
Ran 1 test in 62.719s

OK

johnnynunez · 2025-10-18T16:19:10Z

@zhyncs @Fridge003 is this ready to merge now?

zhyncs · 2025-10-19T20:21:47Z

test/srt/run_suite.py

        TestFile("test_disaggregation_pp.py", 140),
    ],
    "per-commit-4-gpu-b200": [
+        # TestFile("test_flash_attention_4.py"),


we should enable the fa4 unit test on b200

We can add it back after sgl-kernel bumps.
This test can pass locally #11606 (comment)

ok Can you help me look into this issue https://github.com/sgl-project/sglang/actions/runs/18628168698/job/53126330363?pr=11606

[WIP] FA4 Fix

51a9874

johnnynunez mentioned this pull request Oct 14, 2025

[NVIDIA] BUMP FA3 #11444

Merged

johnnynunez added 4 commits October 14, 2025 13:11

Refactor FA4 interface: update version, enhance tensor handling, and …

56a9645

…remove unused code

Refactor FA4 interface: update version, enhance tensor handling, and …

3080e28

…remove unused code

Enhance FA4 tests: update parameterization, improve skip conditions, …

4b489c5

…and refine output validation

Refactor FA4 interface: improve import order, enhance assertion reada…

47a1624

…bility, and format code for consistency

johnnynunez marked this pull request as ready for review October 14, 2025 11:52

johnnynunez requested review from BBuf, FlamingoPg, HaiShaw, ispobock, merrymercy, yizhang2077 and zhyncs as code owners October 14, 2025 11:52

johnnynunez changed the title ~~[WIP] FA4 Fix~~ [NVIDIA] FA4 Fix Oct 14, 2025

johnnynunez and others added 2 commits October 14, 2025 17:53

Update CMakeLists.txt

e07af94

Merge branch 'sgl-project:main' into patch-4

fd24e89

johnnynunez changed the title ~~[NVIDIA] FA4 Fix~~ [NVIDIA] FA4 Fix (Under Testing) Oct 14, 2025

Merge branch 'main' into patch-4

474457c

Fridge003 added the run-ci label Oct 14, 2025

fa3 fixes

27d9474

johnnynunez changed the title ~~[NVIDIA] FA4 Fix (Under Testing)~~ [NVIDIA] FA3/FA4 Fix (Under Testing) Oct 14, 2025

johnnynunez and others added 6 commits October 14, 2025 22:45

Merge branch 'sgl-project:main' into patch-4

ed3d33e

Merge branch 'main' into patch-4

8ceb048

fa3 fixes

757e172

Merge remote-tracking branch 'origin/patch-4' into patch-4

bfead9d

fix error

6003d2f

Merge branch 'main' into patch-4

b437e96

ishandhanani changed the title ~~[NVIDIA] FA3/FA4 Fix (Under Testing)~~ [NVIDIA] FA3/FA4 Fix Oct 15, 2025

Merge branch 'main' into patch-4

08e3a8b

johnnynunez added 2 commits October 17, 2025 00:19

fix test

3c53ed9

Merge branch 'main' into patch-4

41b9cef

Fridge003 and others added 8 commits October 17, 2025 00:04

fix page size

df303ba

Merge branch 'main' into patch-4

3167033

Merge branch 'main' into patch-4

e6d5b00

fix timeout

8c29d91

Merge branch 'main' into patch-4

a95029d

fix timeout

0fb5817

fix FA4 Grouped Attention changes

2d49b45

upd

dfbe35f

Fridge003 mentioned this pull request Oct 17, 2025

[Doc] Update documents for FA4 #11778

Merged

4 tasks

Fridge003 added 2 commits October 17, 2025 19:54

upd

0cfbaa2

fix

7460959

Fridge003 added 2 commits October 18, 2025 01:39

clean tests

7c19e9f

Merge branch 'main' into patch-4

ec85cc3

Fridge003 added 2 commits October 18, 2025 14:37

Merge branch 'main' into patch-4

615bf47

skip sgl-kernel test

386976b

Fridge003 force-pushed the patch-4 branch from 3a92ef4 to 386976b Compare October 19, 2025 03:20

Merge branch 'main' into patch-4

b9688ec

zhyncs reviewed Oct 19, 2025

View reviewed changes

Merge branch 'main' into patch-4

bf9e1a2

zhyncs approved these changes Oct 20, 2025

View reviewed changes

zhyncs merged commit 252dc4e into sgl-project:main Oct 20, 2025
60 of 70 checks passed

b8zhong mentioned this pull request Dec 10, 2025

fix b200 fa4 ci #14788

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[NVIDIA] FA3/FA4 Fix #11606

[NVIDIA] FA3/FA4 Fix #11606
zhyncs merged 48 commits intosgl-project:mainfrom
johnnynunez:patch-4

johnnynunez commented Oct 14, 2025 •

edited

Loading

Uh oh!

johnnynunez commented Oct 16, 2025

Uh oh!

johnnynunez commented Oct 16, 2025

Uh oh!

Fridge003 commented Oct 17, 2025 •

edited

Loading

Uh oh!

Fridge003 commented Oct 17, 2025

Uh oh!

johnnynunez commented Oct 18, 2025

Uh oh!

zhyncs Oct 19, 2025

Uh oh!

Fridge003 Oct 19, 2025

Uh oh!

zhyncs Oct 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

johnnynunez commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

johnnynunez commented Oct 16, 2025

Uh oh!

johnnynunez commented Oct 16, 2025

Uh oh!

Fridge003 commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fridge003 commented Oct 17, 2025

Uh oh!

johnnynunez commented Oct 18, 2025

Uh oh!

zhyncs Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Fridge003 Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

zhyncs Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

johnnynunez commented Oct 14, 2025 •

edited

Loading

Fridge003 commented Oct 17, 2025 •

edited

Loading