Support FP8 Per Token Quant Piecewise by hebiao064 · Pull Request #13272 · sgl-project/sglang

hebiao064 · 2025-11-14T07:41:39Z

Motivation

Before:

torch._dynamo.exc.Unsupported: Operator does not support running with fake tensors
  Explanation: 
  Hint: see https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit#heading=h.64r4npvq0w0 for how to fix

  Developer debug context: unsupported operator: sgl_kernel.fp8_scaled_mm.default

After:

Accuracy: 0.950
Invalid: 0.000
Latency: 5.825 s
Output throughput: 4113.731 token/s

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-11-14T07:41:42Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

BBuf

LGTM.

Support FP8 Per Token Quant Piecewise

83654a5

hebiao064 requested review from BBuf, Edwardf0t1, FlamingoPg and ch-wan as code owners November 14, 2025 07:41

hebiao064 self-assigned this Nov 14, 2025

sglang-bot added the run-ci label Nov 14, 2025

hebiao064 assigned BBuf and merrymercy Nov 14, 2025

BBuf approved these changes Nov 14, 2025

View reviewed changes

hebiao064 enabled auto-merge (squash) November 14, 2025 08:48

merrymercy approved these changes Nov 14, 2025

View reviewed changes

merrymercy disabled auto-merge November 14, 2025 19:40

merrymercy merged commit 0997c78 into main Nov 14, 2025
160 of 170 checks passed

merrymercy deleted the bhe/fp8_piecewise branch November 14, 2025 19:40

hebiao064 mentioned this pull request Nov 17, 2025

[Feature] Roadmap for Prefill (Piecewise) CUDA Graph #11490

Open

34 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support FP8 Per Token Quant Piecewise#13272

Support FP8 Per Token Quant Piecewise#13272
merrymercy merged 1 commit intomainfrom
bhe/fp8_piecewise

hebiao064 commented Nov 14, 2025

Uh oh!

gemini-code-assist bot commented Nov 14, 2025

Uh oh!

BBuf left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hebiao064 commented Nov 14, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Nov 14, 2025

Uh oh!

BBuf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants