TräumerAI: Dreaming Music with StyleGAN

This is repository for TräumerAI: Dreaming Music with StyleGAN, which is submitted to NeurIPS 2020 Workshop for Creativity and Design The model will automatically generate a music visualization video for the selected audio input, using StyleGAN2 trained with WikiArt and audio-visual mapping based on our manual-labeled data pairs.

Requirements

The model was tested on following environment

PyTorch == 1.6.0
cuda == 10.1

These are required Python libraries

pydub==0.24.1
librosa==0.8.0
numba==0.48
torchaudio==0.6.0
ninja==1.10.0.post2
av==8.0.2

During the video generation, ffmpeg command is used. If ffmpeg is not installed, please install ffmpeg as follow:

$ sudo apt-get update 
$ sudo apt-get install ffmpeg

This repository uses submodule for music embedder.

$ git submodule init

Usage

$ python3 generate.py --audio_path sample/song_a.mp3 sample/song_b.mp3 --fps=30 --audio_fps=5

--audio_path: input audio file for generating video. several files can be used as inputs
--fps: frame per second of the generated video. default=15
--bitrate: bitrate (video quality) of the generated video. default=1e7
--audio_fps: frame per second for audio embedding (Assert fps % audio_fps == 0). default==3

The generated video will be saved in sample/ .

We have tested input files in m4a and mp3. Currently only 16 bit audio file is supported. Otherwise, the audio will not be decoded correctly.

Pre-trained model

The weights for the WikiArt pre-trained model is available here. The origin source is https://github.com/pbaylies/stylegan2 which is converted from TensorFlow to PyTorch

Labeling Data

The npy data of 100 pairs of music clip and selected image among generation is available here

License

The pre-trained model is from https://github.com/pbaylies/stylegan2

The PyTorch implementation of StyleGAN2 and the following explanation is from https://github.com/rosinality/stylegan2-pytorch

Model details and custom CUDA kernel codes are from official repostiories: https://github.com/NVlabs/stylegan2

Codes for Learned Perceptual Image Patch Similarity, LPIPS came from https://github.com/richzhang/PerceptualSimilarity

To match FID scores more closely to tensorflow official implementations, I have used FID Inception V3 implementations in https://github.com/mseitzer/pytorch-fid

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
checkpoint		checkpoint
doc		doc
lpips		lpips
music_embedder @ d9217f8		music_embedder @ d9217f8
op		op
original_stylegan2_codes		original_stylegan2_codes
sample		sample
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
LICENSE-FID		LICENSE-FID
LICENSE-LPIPS		LICENSE-LPIPS
LICENSE-NVIDIA		LICENSE-NVIDIA
README.md		README.md
audio_model.py		audio_model.py
checkpoint_best		checkpoint_best
convert_weight.py		convert_weight.py
distributed.py		distributed.py
generate.py		generate.py
hparams.py		hparams.py
label_data_monitoring.ipynb		label_data_monitoring.ipynb
model.py		model.py
smoothing_utils.py		smoothing_utils.py
style_latent_stat.pt		style_latent_stat.pt
tf_tanh_L1_FCN037_it30000_lr0.0001.pt		tf_tanh_L1_FCN037_it30000_lr0.0001.pt
train_transfer.py		train_transfer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

TräumerAI: Dreaming Music with StyleGAN

Requirements

Usage

Pre-trained model

Labeling Data

License

About

Licenses found

Languages

License

Licenses found

jdasam/traeumerAI

Folders and files

Latest commit

History

Repository files navigation

TräumerAI: Dreaming Music with StyleGAN

Requirements

Usage

Pre-trained model

Labeling Data

License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Languages