- [2024.9.1] 📈 Adding new datasets and supervision types.
- [2024.7.20] 👀 Accelerate and batched inference support for faster generation
- [2024.7.9] 🚀 Codebase released!
- [2024.7.9] 💫 VSTaR-1M released!
- [2024.7.9] 📄 arXiv released
- [2024.6.20] 🤗 Hugging Face demo released - you are welcome to explore the VSTaR-1M dataset
- [2024.6.17] 🔥 README release
Video-STaR can adapt LVLMs to diverse downstream tasks and datasets
Models utilizing Video-STaR show improvement on visual understanding datasets - like Temporal Compass:
- Python >= 3.10
- Pytorch == 2.0.1
- CUDA Version >= 11.7
- Install required packages:
git clone https://github.com/orrzohar/Video-STaR
cd Video-STaR
conda create -n videostar python=3.10 -y
conda activate videostar
pip install --upgrade pip # enable PEP 660 support
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
pip install decord opencv-python git+https://github.com/facebookresearch/pytorchvideo.git@28fe037d212663c6a24f373b94cc5d478c8c1a1d
- LLaVA The codebase we built upon.
- Video-ChatGPT Great job contributing the evaluation benchmark and VI-100K dataset.
- Video-LLaVA Base Model.
- LLaMA-VID Base Model.
- More coming soon...
- Video-Agent
- The majority of this project is released under the Apache 2.0 license as found in the LICENSE file.
- The service is a research preview intended for non-commercial use only, subject to the model License of LLaMA, Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Please contact us if you find any potential violation.
If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.
@inproceedings{zohar2024videostar,
title = {Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision},
author = {Zohar, Orr and Wang, Xiaohan and Bitton, Yonatan and Szpektor, Idan and Yeung-levy, Serena},
year = {2024},
booktitle = {arXiv preprint arXiv:2407.06189},
}