Conversation
…remove unused code
…remove unused code
…and refine output validation
…bility, and format code for consistency
Seems the test itself |
unit test fixed |
|
Result of /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
============================= test session starts ==============================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /sgl-workspace/sglang/sgl-kernel
configfile: pyproject.toml
plugins: typeguard-4.4.4, anyio-4.11.0
collected 1968 items
sgl-kernel/tests/test_flash_attention_4.py ............................. [ 1%]
........................................................................ [ 5%]
........................................................................ [ 8%]
.........................ssssssssssssssssss......ssssssssssssssssss..... [ 12%]
.ssssssssssssssssss......ssssssssssssssssss............................. [ 16%]
........................................................................ [ 19%]
........................................................................ [ 23%]
........................................................................ [ 27%]
.................................................ssssssssssssssssss..... [ 30%]
.ssssssssssssssssss......ssssssssssssssssss......ssssssssssssssssss..... [ 34%]
.ssssssssssssssssss......ssssssssssssssssss............................. [ 38%]
........................................................................ [ 41%]
.ssssssssssssssssss......ssssssssssssssssss............................. [ 45%]
........................................................................ [ 49%]
........................................................................ [ 52%]
........................................................................ [ 56%]
........................................................................ [ 60%]
........................................................................ [ 63%]
........................................................................ [ 67%]
........................................................................ [ 70%]
........................................................................ [ 74%]
........................................................................ [ 78%]
........................................................................ [ 81%]
........................................................................ [ 85%]
........................................................................ [ 89%]
.......................................................ssssssssssss..... [ 92%]
...............................ssssssssssss............................. [ 96%]
.......ssssssssssss....................................ssssssssssss [100%]
=============================== warnings summary ===============================
tests/test_flash_attention_4.py: 51414 warnings
/usr/local/lib/python3.12/dist-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/_mlir_helpers/op.py:31: DeprecationWarning: cute.arch.exp2 is deprecated, use cute.math.exp2 with `fastmath=True` instead
res_or_list = opFunc(*args, **kwargs, loc=loc)
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======== 1704 passed, 264 skipped, 51414 warnings in 713.71s (0:11:53) ========= |
|
Result of ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 64.57it/s]
Accuracy: 0.730
Invalid: 0.010
Latency: 1.587 s
Output throughput: 4754.717 token/s
{'accuracy': np.float64(0.73), 'invalid': np.float64(0.01), 'latency': 1.587055522017181, 'output_throughput': 4754.717081610904}
.
----------------------------------------------------------------------
Ran 1 test in 62.719s
OK
|
|
@zhyncs @Fridge003 is this ready to merge now? |
| TestFile("test_disaggregation_pp.py", 140), | ||
| ], | ||
| "per-commit-4-gpu-b200": [ | ||
| # TestFile("test_flash_attention_4.py"), |
There was a problem hiding this comment.
we should enable the fa4 unit test on b200
There was a problem hiding this comment.
We can add it back after sgl-kernel bumps.
This test can pass locally #11606 (comment)
There was a problem hiding this comment.
ok Can you help me look into this issue https://github.com/sgl-project/sglang/actions/runs/18628168698/job/53126330363?pr=11606
Motivation
Fix FA3/FA4 with latest changes