Add LongCat T2V (Base, Distillation and Refinement) Support to FastVideo by alexzms · Pull Request #883 · hao-ai-lab/FastVideo

alexzms · 2025-11-18T01:42:50Z

Summary

This PR integrates LongCat-Video into FastVideo as a first-class text-to-video (T2V) pipeline, including:

Native LongCat DiT model config and registration
A LongCat pipeline that supports base 480p generation
Distillation via LoRA
480p → 720p refinement (two-stage pipeline)
Optional Block Sparse Attention (BSA) and 3D RoPE support
Utilities for checkpoint conversion

Key Changes

1. Model & Pipeline Configuration

fastvideo/configs/models/dits/longcat.py
- Defines LongCatVideoArchConfig and LongCatVideoConfig for the native LongCat DiT.
- Adds parameter name mappings to convert official LongCat weights to FastVideo naming (embedders, AdaLN, self/cross-attn, FFN, final layer).
- Exposes BSA-related fields and video-specific settings (3D patches, caption channels, etc.).
fastvideo/configs/pipelines/longcat.py
- Adds LongCatDiTArchConfig for Phase 1 wrapper compatibility.
- Adds LongCatT2V480PConfig (base 480p pipeline) and LongCatT2V704PConfig (704p refinement with BSA enabled).
fastvideo/configs/models/dits/__init__.py / fastvideo/models/registry.py / fastvideo/configs/pipelines/registry.py / fastvideo/pipelines/pipeline_registry.py
- Wires LongCat configs and models into the existing registry:
  - Registers LongCat DiT classes and LongCatPipeline.
  - Adds pipeline detection and fallback under the "longcat" key.

2. LongCat Pipeline & Stages (Base + Refinement)

fastvideo/pipelines/basic/longcat/longcat_pipeline.py
- Implements LongCatPipeline as a composed pipeline with LoRA support.
- Assembles stages for:
  - text encoding, timestep prep, latent prep, denoising, decoding
  - LongCat-specific refine stages (LongCatRefineInitStage, LongCatRefineTimestepStage).
- Enables runtime BSA configuration from pipeline config / CLI and propagates parameters to transformer blocks.
fastvideo/pipelines/stages/longcat_refine_init.py
- Initializes 480p → 720p refinement:
  - Loads stage1 video (path or in-memory frames).
  - Upsamples spatially/temporally, applies temporal padding compatible with VAE/BSA.
  - VAE-encodes, normalizes latents, and mixes with noise according to t_thresh.
  - Stores padding metadata in batch for later cropping.
fastvideo/pipelines/stages/longcat_refine_timestep.py
- Builds LongCat refinement timesteps starting at t_thresh and updates scheduler timesteps/sigmas accordingly.
fastvideo/pipelines/stages/longcat_denoising.py
- LongCat-specific denoising loop:
  - Batched CFG with CFG-zero optimal guidance scale.
  - Negates noise_pred to match flow-matching scheduler convention.
fastvideo/pipelines/pipeline_batch_info.py / fastvideo/configs/sample/base.py
- Extends ForwardBatch and SamplingParam with LongCat refine fields: refine_from, t_thresh, spatial_refine_only, num_cond_frames, stage1_video.
- Adds corresponding CLI args (--refine-from, --t-thresh, etc.).
fastvideo/pipelines/stages/latent_preparation.py / fastvideo/pipelines/stages/decoding.py
- Adjusts latent scaling for pre-initialized latents in refine mode (no double init_noise_sigma).
- Crops extra padded frames after decoding using refine padding metadata.
fastvideo/pipelines/stages/utils.py
- Adds aspect-ratio bucket tables and get_bucket_config() used to select resolutions for 480p/720p LongCat.

3. RoPE, LoRA & BSA Triton kernel support

fastvideo/layers/rotary_embedding_3d.py
- Adds 3D RoPE implementation for video transformers, splitting head dim into (T, H, W) components.
fastvideo/layers/lora/linear.py / fastvideo/pipelines/lora_pipeline.py / fastvideo/fastvideo_args.py
- Implements LoRA alpha scaling in merge logic (alpha = lora_alpha / rank).
- Teaches LoRAPipeline to parse/store *.lora_alpha alongside lora_A, lora_B.
- Introduces CLI flags for loading LoRA adapters (--lora-path, --lora-nickname, --lora-target-modules).
fastvideo/third_party/longcat_video/block_sparse_attention/*
- Integrates LongCat’s Triton-based Block Sparse Attention kernels and helper utilities (including p2p communication for context parallelism).

4. Checkpoint Conversion, Inference Scripts

scripts/checkpoint_conversion/longcat_to_fastvideo.py
- Converts official LongCat-Video checkpoints into a FastVideo-compatible layout using the new param name mappings.
scripts/inference/v1_inference_longcat*.sh
- Example scripts for:
  - 480p LongCat T2V generation
  - 480p distillation with LoRA
  - 480p → 720p refinement from an existing video (with BSA + refinement LoRA)

Tests

Base LongCat T2V: Ran generation on all prompts under assets/ and verified successful 480p video outputs for each prompt.
Distill + Refine pipeline: Used a representative prompt to test the full two-stage flow:
1. 480p distilled generation, and
2. 480p → 720p refinement from the generated video.

Known limitations:

Distill and refine are currently triggered by two separate scripts and require manual chaining. If needed in the future, we plan to add a patch that unifies both stages into a single end-to-end command.
We observe that 2-GPU generation is currently slower than 1-GPU generation for the LongCat pipeline; this performance issue is under investigation.

…rence/v1_inference_wan.sh

…deo way

…nce.

alexzms · 2025-12-16T05:12:59Z

Successfully implemented SSIM and verified it on the 4x L40S platform. Currently, the thresholds are set to 0.93 for distill/base and 0.90 for refine. Please let me know if these values align with expectations (or if they need tuning).

alexzms · 2025-12-23T06:17:55Z

The SSIM tests are still timing out (>60 mins) even with reduced steps. I suggest merging the Longcat t2v without ssim for now, and I'll handle the SSIM efficiency optimization in a separate PR.

…deo (hao-ai-lab#883) Co-authored-by: Shao Duan <shaoxiongduan@gmail.com>

…deo (#883) Co-authored-by: Shao Duan <shaoxiongduan@gmail.com>

alexzms and others added 30 commits November 2, 2025 22:49

process crumb

b3ba003

todo list. ready to merge longcat

c5555c1

longcat pipeline config and pipeline config registeration

1d90e17

Merge branch 'hao-ai-lab:main' into main

9a27cde

pipeline detector

3cb071c

reduce prompt number for faster testing

770bf81

todo update

6e30e88

minor

c0e26a5

adding scheduling_flow_match_euler_discrete

727fae4

move file. change import

4c316f5

FlowMatchEulerDiscreteScheduler done

3e40d68

third party - add longcat video core codes

15f95e6

ensure import correctness. progress recorded in todolist.md

37783f4

verify that current modification does not interfere with scripts/infe…

e308109

…rence/v1_inference_wan.sh

Marked done in TODO list

2e11178

ready to migrate umt5, modifying todolist

e0af37e

umt5 done

c505084

vae loader done

bcadae7

mark progress

29e5225

ready to write longcat pipeline

d6f289b

ready to refactor step 3

619a399

ready to unroll imports

b609410

marking progress

3b68c74

this is actually lots of work. rewrite LongCatTransformer3D in FastVi…

1092cb6

…deo way

two stage

cddc0f7

two stage todo list

79b869e

longcat pipeline build

39c54ff

build pipeline framework done

a6205a1

added wrapper pipelines and code for longcat model loading and infere…

bea9bea

…nce.

first phase of longcat integration (wrapper) complete

adf85cd

alexzms added 8 commits December 15, 2025 05:00

2 gpu l40s to avoid cuda oom

9aef5a2

4 l40 gpu

c3f3e9f

try solve oom problem in refinement

bccc522

yapf

14349fd

getattr removal

9775755

no fps refinement

627051a

no vae offloading

5d26475

l40 longcat ssim videos

5e43fc3

alexzms added 6 commits December 16, 2025 05:13

yapf pre commit

104d1c8

Merge branch 'hao-ai-lab:main' into main

e077f3d

ci

785b5b4

Merge branch 'hao-ai-lab:main' into main

2266d38

Merge branch 'hao-ai-lab:main' into main

811c4a0

ssim volume

ccbfbbd

alexzms force-pushed the main branch from 7c0b1bd to ccbfbbd Compare December 22, 2025 04:25

alexzms added 9 commits December 22, 2025 05:14

ssim takes very long time

78f1f3c

longer ssim

3a0fbea

smaller num infer steps and resol

72b7664

only smaller infer steps and update video file

21c2dfe

4gpu is necessary

9eacf7e

ssim minor change

9cd73da

remove ssim in this pr

e932bf8

Merge remote-tracking branch 'upstream/main'

be73ed5

precommit

9259b60

SolitaryThinker approved these changes Dec 23, 2025

View reviewed changes

SolitaryThinker merged commit 8f1e6c3 into hao-ai-lab:main Dec 23, 2025
1 check passed

shijiew555 pushed a commit to Gary-ChenJL/FastVideo that referenced this pull request Apr 8, 2026

Add LongCat T2V (Base, Distillation and Refinement) Support to FastVi…

9481c38

…deo (hao-ai-lab#883) Co-authored-by: Shao Duan <shaoxiongduan@gmail.com>

RandNMR73 pushed a commit that referenced this pull request Apr 8, 2026

Add LongCat T2V (Base, Distillation and Refinement) Support to FastVi…

9211fa8

…deo (#883) Co-authored-by: Shao Duan <shaoxiongduan@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LongCat T2V (Base, Distillation and Refinement) Support to FastVideo#883

Add LongCat T2V (Base, Distillation and Refinement) Support to FastVideo#883
SolitaryThinker merged 134 commits intohao-ai-lab:mainfrom
FoundationResearch:main

alexzms commented Nov 18, 2025

Uh oh!

alexzms commented Dec 16, 2025

Uh oh!

alexzms commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

alexzms commented Nov 18, 2025

Summary

Key Changes

1. Model & Pipeline Configuration

2. LongCat Pipeline & Stages (Base + Refinement)

3. RoPE, LoRA & BSA Triton kernel support

4. Checkpoint Conversion, Inference Scripts

Tests

Uh oh!

alexzms commented Dec 16, 2025

Uh oh!

alexzms commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants