Skip to content

Support FP8 Per Token Quant Piecewise#13272

Merged
merrymercy merged 1 commit intomainfrom
bhe/fp8_piecewise
Nov 14, 2025
Merged

Support FP8 Per Token Quant Piecewise#13272
merrymercy merged 1 commit intomainfrom
bhe/fp8_piecewise

Conversation

@hebiao064
Copy link
Collaborator

Motivation

Before:

torch._dynamo.exc.Unsupported: Operator does not support running with fake tensors
  Explanation: 
  Hint: see https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit#heading=h.64r4npvq0w0 for how to fix

  Developer debug context: unsupported operator: sgl_kernel.fp8_scaled_mm.default

After:

Accuracy: 0.950
Invalid: 0.000
Latency: 5.825 s
Output throughput: 4113.731 token/s
Screenshot 2025-11-13 at 11 41 29 PM

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Collaborator

@BBuf BBuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@hebiao064 hebiao064 enabled auto-merge (squash) November 14, 2025 08:48
@merrymercy merrymercy disabled auto-merge November 14, 2025 19:40
@merrymercy merrymercy merged commit 0997c78 into main Nov 14, 2025
160 of 170 checks passed
@merrymercy merrymercy deleted the bhe/fp8_piecewise branch November 14, 2025 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants