Skip to content

Why does the phenomenon of 𝛿 𝑆 = 25 Ξ΄ S ​ =25 while 𝛿 𝑇 = 50 Ξ΄ T ​ =50 occur?Β #50

@2741913295

Description

@2741913295

Dear author, while carefully reading your paper, I noticed that when fixing 𝛿𝑇=50, there is a case where 𝛿s=25. I am very confused about this because, according to the original explanation, 𝛿s is described as the accelerated step size, while 𝛿𝑇 is the fixed step size. My understanding is that this is analogous to climbing stairs: 𝛿𝑇 represents a fixed step (one stair), and 𝛿s, being an accelerated step, could be, for example, twice 𝛿𝑇​ , meaning you climb two stairs at a time. However, in this case, 𝛿s=25, which is half of 𝛿𝑇 , feels like climbing "half a stair," which doesn't seem to make sense. Similarly, in the context of the paper, since 𝛿s is described as the accelerated step size, it should be 𝑛,n times 𝛿𝑇, where 𝑛 is an integer starting from 1. Therefore, it seems that 𝛿s should not be smaller than 𝛿𝑇 . I am very puzzled by this.
My second question is regarding the design where X𝑠 is input into the diffusion model to predict its corresponding noise (without text), while the noise prediction after inputting
X𝑑 includes text, and then the gradient difference between these two noises is used to update the Gaussian. Why is it designed this way, with one prediction having text and the other not? I might not have fully understood the Interval Score Matching (ISM) method yet.

If you have time, could you please help me address these two questions? This is very important to me, and I sincerely thank you for taking the time to read my questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions