long video #44

betterze · 2024-03-15T18:41:10Z

Dear V-Jepa team,

Thank you for sharing this great work; I really enjoyed it.

If I understand correctly, the model is only trained with a video of 16 frames (after frame skipping, around 3s). Does it work with long videos with long frames (>60 frames, >10s or >30s)? Or do I need to fine-tune it?

Thank you for your help.

Best Wishes,

Zongze

sumo43 · 2024-04-30T00:55:48Z

For longer videos, the authors split the video into several clips, each longer than 3 seconds. In each clip they randomly sampled a 64-frame slice. So you would run the model for each clip and concatenate the clip-level latents, then use those.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

long video #44

long video #44

betterze commented Mar 15, 2024

sumo43 commented Apr 30, 2024

long video #44

long video #44

Comments

betterze commented Mar 15, 2024

sumo43 commented Apr 30, 2024