This is the official Pytorch implementation for the paper, "Strumming to the Beat: Audio-Coniditoned Contrastive Video Texture Synthesis", WACV 2022. We provide the datasets and code for training and testing the contrastive video texture synthesis model and the baselines as described in the paper.
If you find our repo useful in your research, please use the following BibTeX entry for citation.
@InProceedings{Narasimhan_2022_WACV,
author = {Narasimhan, Medhini and Ginosar, Shiry and Owens, Andrew and Efros, Alexei A. and Darrell, Trevor},
title = {Strumming to the Beat: Audio-Conditioned Contrastive Video Textures},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {January},
year = {2022},
pages = {3761-3770}
}
Create the conda environment from the yaml file and activate the environment,
conda env create -f avgan.yml
conda activate avgan
Coming soon!
cd contrastive_video_textures
Train model for a single video:
python main.py -vdata <path to video folder> -m 1 -w 20 -stride 4 -temp 0.1 -th 0.0 -bs 8 -negs 14 -vl <list of video names> -ea slowfast -lr 1e-4
Synthesize texture for the same video using the above model:
python main.py -vdata <path to video folder> -m 1 -w 20 -stride 4 -temp 0.1 -th 0.3 -bs 24 -vl <list of video names> -e -mbs 100
First, train a contrastive model for the video using the command above. Ensure that the audio for the same video is in the audio folder as a wav file with the same name. Next, to synthesize a new video conditioned on an audio,
python main.py -vdata <path to video folder> -adata <path to audio folder> -m 2 -w 20 -stride 4 -temp 0.1 -th 0.0 -bs 24 -negs 20 -e -vl <list of video names> -da <list of coniditioning audios> -alpha 0.5
cd baselines
cd classic_video_textures
- Classic:
python video_textures.py -m 1 -vdata <source video folder> -vl <list of video names> -s -bs 48
- Classic+:
python video_textures.py -m 2 -vdata <source video folder> -vl <list of video names> -s -bs 48
- Classic++:
python video_textures.py -m 3 -vdata <source video folder> -vl <list of video names> -s -bs 48
cd audio_baselines
- Random Clip:
python random_segment_baseline.py -vl <original_video_list> -tl <target_audio_list>
- Random Baseline:
python random_baseline.py -vl <original_video_list> -tl <target_audio_list>
- Random Shift:
python random_shift.py -vl <original_video_list> -tl <target_audio_list>
- Audio Nearest Neighbour:
python audio_nearestneighbour.py -vl <original_video_list> -dl <target_audio_list>