Fix vanilla and torch attention cu_seqlens handling by Mr-Neutr0n · Pull Request #310 · Tencent-Hunyuan/HunyuanVideo

Mr-Neutr0n · 2026-02-11T12:32:06Z

Summary

Vanilla attention (mode="vanilla") completely ignores cu_seqlens_q/cu_seqlens_kv when provided, allowing attention to bleed across segment boundaries (valid tokens attend to padding tokens and vice versa). This causes the quality degradation reported in [BUG] attention 使用 vanilla 版本质量差上许多 #296. Fixed by building a block-diagonal attention mask from cu_seqlens that prevents cross-segment attention.
Torch attention (mode="torch") hardcodes cu_seqlens_q[1] as a single split point, which only works for batch_size=1. For batch_size > 1, the remaining segment boundaries in cu_seqlens are silently ignored. Fixed by iterating over all batch items using the full cu_seqlens array.

Details

flash_attn_varlen_func in the flash mode correctly handles variable-length sequences via cu_seqlens, separating valid (image + text) tokens from padding tokens. The vanilla and torch code paths did not replicate this behavior:

Vanilla mode computed attention over the entire concatenated sequence with no masking at all. When cu_seqlens encodes a valid segment [0, s) and a padding segment [s, max_len), all tokens could freely attend to each other, corrupting the output.

Torch mode used cu_seqlens_q[1] to split into exactly two segments. The cu_seqlens array has 2 * batch_size + 1 entries (a valid/padding pair per batch item), so the two-segment split is only correct when batch_size == 1.

Test plan

Verify vanilla attention output matches flash attention output (within numerical tolerance) for batch_size=1 with padding tokens
Verify torch attention output matches flash attention output for batch_size > 1
Confirm generation quality with mode="vanilla" is comparable to mode="flash" (addresses [BUG] attention 使用 vanilla 版本质量差上许多 #296)

The vanilla attention mode completely ignored cu_seqlens_q/cu_seqlens_kv when they were provided, computing attention over the entire concatenated sequence including padding tokens. This caused valid image/text tokens to attend to padding tokens and vice versa, leading to severe quality degradation compared to flash attention (see Tencent-Hunyuan#296). The torch attention mode hardcoded cu_seqlens_q[1] as a single split point, which only worked correctly for batch_size=1. For batch_size > 1 the cu_seqlens array contains multiple segment boundaries that were silently ignored, producing incorrect attention outputs. Changes: - vanilla mode: build a block-diagonal attention mask from cu_seqlens that prevents cross-segment attention between valid and padding tokens - torch mode: iterate over all batch items using the full cu_seqlens array to correctly split valid/padding segments per sample

tencent-adm · 2026-02-11T12:32:25Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Mr-Neutr0n · 2026-02-12T09:40:49Z

I have read the CLA Document and I hereby sign the CLA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix vanilla and torch attention cu_seqlens handling#310

Fix vanilla and torch attention cu_seqlens handling#310
Mr-Neutr0n wants to merge 1 commit intoTencent-Hunyuan:mainfrom
Mr-Neutr0n:fix/attention-cu-seqlens-handling

Mr-Neutr0n commented Feb 11, 2026

Uh oh!

tencent-adm commented Feb 11, 2026

Uh oh!

Mr-Neutr0n commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Mr-Neutr0n commented Feb 11, 2026

Summary

Details

Test plan

Uh oh!

tencent-adm commented Feb 11, 2026

Uh oh!

Mr-Neutr0n commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants