-
Notifications
You must be signed in to change notification settings - Fork 246
Open
Description
Great work! thanks for releasing this codebase!
I noticed that the current implementation mainly adopts a distillation-style setup (distilling from a 14B model to a smaller model with the Self-Forcing framework). I’m wondering whether this framework can also be used for post-training a smaller model that has already been pre-trained with Diffusion Forcing, i.e., small → small self-forcing post-training rather than large → small distillation. Is the reason why you choose distillation because this could better use the pretrained knowledge?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels