Skip to content

Comments

【New Feature】W4afp8 supports per group quantization#4272

Merged
Jiang-Jia-Jun merged 27 commits intoPaddlePaddle:developfrom
yangjianfengo1:w4afp8
Nov 5, 2025
Merged

【New Feature】W4afp8 supports per group quantization#4272
Jiang-Jia-Jun merged 27 commits intoPaddlePaddle:developfrom
yangjianfengo1:w4afp8

Conversation

@yangjianfengo1
Copy link
Contributor

@yangjianfengo1 yangjianfengo1 commented Sep 25, 2025

描述:
本 PR 为 w4afp8的激活支持动态per token量化,权重支持per group量化,对于token=256,m=1792, k=8192的moe w4afp8 gemm,激活shape为[256, 8192],权重shape为[1792,8192] (方便描述起见省略了专家数)

  • 之前激活的量化方式静态per tensor,激活scale的shape为[1],权重量化方式为per channel,即scale的shape为[1792],
  • 现在激活的量化方式动态per token,激活scale的shape为[256],权重可以在channel维度上支持per group,group的大小必须是128的倍数,即scale的shape为[1792, 8192 / 128=64]

使用方式

  • 权重若要开启per group量化,那么产出的权重scale的shape为[num_export, K/128,M]
  • 激活若要开启动态量化,在权重的config的quantization_config字段中添加"moe_dynamic_quant": true

性能变化:
image

@yangjianfengo1 yangjianfengo1 changed the title w4afp8 支持per group 【New Feature】W4afp8 supports per group quantization Sep 26, 2025
Copy link
Collaborator

@carryyu carryyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 93fcf7e into PaddlePaddle:develop Nov 5, 2025
12 of 14 checks passed
EmmonsCurse added a commit that referenced this pull request Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants