Skip to content

Question about training #79

@ninghaolu

Description

@ninghaolu

Great work! thanks for releasing this codebase!

I noticed that the current implementation mainly adopts a distillation-style setup (distilling from a 14B model to a smaller model with the Self-Forcing framework). I’m wondering whether this framework can also be used for post-training a smaller model that has already been pre-trained with Diffusion Forcing, i.e., small → small self-forcing post-training rather than large → small distillation. Is the reason why you choose distillation because this could better use the pretrained knowledge?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions