Add joint tensor and KV cache support to USP method#586
Merged
feifeibear merged 9 commits intoxdit-project:mainfrom Nov 12, 2025
Merged
Add joint tensor and KV cache support to USP method#586feifeibear merged 9 commits intoxdit-project:mainfrom
feifeibear merged 9 commits intoxdit-project:mainfrom
Conversation
Collaborator
|
/gemini review |
Contributor
There was a problem hiding this comment.
Code Review
This pull request is a great step towards unifying attention mechanisms by adding joint tensor and KV-cache support to the USP method. The goal of deprecating the yunchang path to improve torch.compile compatibility is well-motivated. The changes in xfuser/model_executor/layers/usp.py are substantial and well-supported by new unit tests that verify equivalence with the old implementation. The modifications in attention_processor.py and transformer_flux.py to adopt the new USP interface are consistent and correct. Overall, the changes are well-executed. I have one suggestion to improve the structure of the new tests for better maintainability.
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What?
Adds support for KV-cache and joint tensors to USP method
Why?
Currently some models, like Hunyuanvideo have diverging code paths based on the input parameters. The two paths (Yunchang / USP) have different implementations for comms. Yunchang path uses certain features from
torch.distributedthat are not compatible withtorch.compile. As USP method can be fully compiled, this PR aims to make USP support the features the Yunchang path does, allowing us to only use USP.This PR is the first step towards deprecating yunchang in the long term, as discussed in #579 , but does not aim to fully remove it in the short term.
How?
Ported Yunchang features directly to USP method. This includes the joint tensors as well as KV cache for pipeline parallelism. Also changed Hunyuanvideo and Flux to already use only USP rather than Yunchang path.
Tests
Output:
Hunyuanvideo:
Tested both Ring/Ulysses.
Hunyuanvideo already uses USP by default if the input prompt is of a specific shape. The command and output below are from changing the other code path, where it previously used yunchang / hybrid_seq_parallel_attn.
Run command:
hunyuan_test_usp.mp4
Flux:
Flux uses standard USP already by default. Only in the case of pipeline parallelism did it use yunchang:
Run command:
Perf
Hunyuanvideo
The swap from yunchang code path to USP code path improves the performance, as now we can use torch.compile for the attention call as well. Here we have timed three subsequent runs and reported the average:
Before:
193.2sAfter:
188.3sThis now matches the perf of the original USP path.
Automatic tests
Added two new unit tests as well to compare the output of Yunchang / USP.
Other
This doesn't change the standard USP method behaviour, so other models using USP won't be affected. Some models still use Yunchang, so they would need to be changed in future PRs.
For ease of comparison, here's the original USP method: