Hi, I don't want to do any distillation or ode training. I only want to train a causal self-forcing model use my own data. What should I do?