-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Dear author, while carefully reading your paper, I noticed that when fixing πΏπ=50, there is a case where πΏs=25. I am very confused about this because, according to the original explanation, πΏs is described as the accelerated step size, while πΏπ is the fixed step size. My understanding is that this is analogous to climbing stairs: πΏπ represents a fixed step (one stair), and πΏs, being an accelerated step, could be, for example, twice πΏπβ , meaning you climb two stairs at a time. However, in this case, πΏs=25, which is half of πΏπ , feels like climbing "half a stair," which doesn't seem to make sense. Similarly, in the context of the paper, since πΏs is described as the accelerated step size, it should be π,n times πΏπ, where π is an integer starting from 1. Therefore, it seems that πΏs should not be smaller than πΏπ . I am very puzzled by this.
My second question is regarding the design where Xπ is input into the diffusion model to predict its corresponding noise (without text), while the noise prediction after inputting
Xπ‘ includes text, and then the gradient difference between these two noises is used to update the Gaussian. Why is it designed this way, with one prediction having text and the other not? I might not have fully understood the Interval Score Matching (ISM) method yet.
If you have time, could you please help me address these two questions? This is very important to me, and I sincerely thank you for taking the time to read my questions.