@DjKesu
Looking at the claims made, the model can run at 16FPS on an iPhone 15 Pro Max. To be clear, what is the definition for the model running at 16 FPS. Intuitively this would mean a 16FPS video could be segmented in approximately real-time. After looking through the benchmarking code it appears that the image encoding step is outside of the timing loop, and the 'FPS' is really measuring the number of prompt update + mask decoding steps that can be made per second to a static image. When including the image encoding step in the benchmark (as would be required for a real video, each frame requires encoding), the FPS drops ~15x.
Have I misunderstood something, as the existing FPS definition seems extremely misleading?
@DjKesu
Looking at the claims made, the model can run at 16FPS on an iPhone 15 Pro Max. To be clear, what is the definition for the model running at 16 FPS. Intuitively this would mean a 16FPS video could be segmented in approximately real-time. After looking through the benchmarking code it appears that the image encoding step is outside of the timing loop, and the 'FPS' is really measuring the number of prompt update + mask decoding steps that can be made per second to a static image. When including the image encoding step in the benchmark (as would be required for a real video, each frame requires encoding), the FPS drops ~15x.
Have I misunderstood something, as the existing FPS definition seems extremely misleading?