Skip to content

Inference Speed Claims #24

@Leo-Mooney

Description

@Leo-Mooney

@DjKesu
Looking at the claims made, the model can run at 16FPS on an iPhone 15 Pro Max. To be clear, what is the definition for the model running at 16 FPS. Intuitively this would mean a 16FPS video could be segmented in approximately real-time. After looking through the benchmarking code it appears that the image encoding step is outside of the timing loop, and the 'FPS' is really measuring the number of prompt update + mask decoding steps that can be made per second to a static image. When including the image encoding step in the benchmark (as would be required for a real video, each frame requires encoding), the FPS drops ~15x.
Have I misunderstood something, as the existing FPS definition seems extremely misleading?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions