Skip to content

Comments

[BugFix] fix VL fp8 bug when moe token_num is 0#4928

Merged
EmmonsCurse merged 6 commits intoPaddlePaddle:developfrom
ming1753:dev_fp8
Nov 12, 2025
Merged

[BugFix] fix VL fp8 bug when moe token_num is 0#4928
EmmonsCurse merged 6 commits intoPaddlePaddle:developfrom
ming1753:dev_fp8

Conversation

@ming1753
Copy link
Collaborator

@ming1753 ming1753 commented Nov 10, 2025

Motivation

多模态模型可能存在仅有文本无图片,或者仅有图片无文本的情况,此时图像Moe或文本Moe会收到一个token_num=0的输入。Triton算子不支持。

Modifications

一种方法是拼接一个长度为1的token vector,不跳过kernel执行,并且在计算完成后切回真实长度。优点是这样理论上Prefill可以跑Cuda Graph,但是缺点是会造成轻微显存增长。
考虑到短期内prefill不需要跑Cuda Graph,现采用token_num=0时跳过执行的修复方式。

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Nov 10, 2025

Thanks for your contribution!

@EmmonsCurse EmmonsCurse merged commit 3148dbc into PaddlePaddle:develop Nov 12, 2025
19 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants