SampleRNN

Code accompanying the paper SampleRNN: An Unconditional End-to-End Neural Audio Generation Model [here and here]. Samples are available here.

Dependencies

Extensively tested with:

cuDNN 5105
Python 2.7.12
Numpy 1.11.1
Theano 0.8.2 (0.9 for WaveNet re-implementation)
Lasagne 0.2.dev1

Datasets

Music dataset was created from all 32 Beethoven’s piano sonatas available publicly on archive.org. datasets/music contains scripts to preprocess and build this dataset. It is also available here for download. Extract the tar file and put all the numpy files in datasets/music directory.

Training

To train a model on an existing dataset with accelerated GPU processing, you need to run following lines from the root of sampleRNN_ICLR2017 folder which corresponds to the best found set of hyper-paramters.

Mission control center:

$ pwd
/u/mehris/sampleRNN_ICLR2017

SampleRNN (2-tier)

$ python models/two_tier/two_tier.py -h
usage: two_tier.py [-h] [--exp EXP] --n_frames N_FRAMES --frame_size
                   FRAME_SIZE --weight_norm WEIGHT_NORM --emb_size EMB_SIZE
                   --skip_conn SKIP_CONN --dim DIM --n_rnn {1,2,3,4,5}
                   --rnn_type {LSTM,GRU} --learn_h0 LEARN_H0 --q_levels
                   Q_LEVELS --q_type {linear,a-law,mu-law} --which_set
                   {ONOM,BLIZZ,MUSIC} --batch_size {64,128,256} [--debug]
                   [--resume]

two_tier.py No default value! Indicate every argument.

optional arguments:
  -h, --help            show this help message and exit
  --exp EXP             Experiment name
  --n_frames N_FRAMES   How many "frames" to include in each Truncated BPTT
                        pass
  --frame_size FRAME_SIZE
                        How many samples per frame
  --weight_norm WEIGHT_NORM
                        Adding learnable weight normalization to all the
                        linear layers (except for the embedding layer)
  --emb_size EMB_SIZE   Size of embedding layer (0 to disable)
  --skip_conn SKIP_CONN
                        Add skip connections to RNN
  --dim DIM             Dimension of RNN and MLPs
  --n_rnn {1,2,3,4,5}   Number of layers in the stacked RNN
  --rnn_type {LSTM,GRU}
                        GRU or LSTM
  --learn_h0 LEARN_H0   Whether to learn the initial state of RNN
  --q_levels Q_LEVELS   Number of bins for quantization of audio samples.
                        Should be 256 for mu-law.
  --q_type {linear,a-law,mu-law}
                        Quantization in linear-scale, a-law-companding, or mu-
                        law compandig. With mu-/a-law quantization level shoud
                        be set as 256
  --which_set {ONOM,BLIZZ,MUSIC}
                        ONOM, BLIZZ, or MUSIC
  --batch_size {64,128,256}
                        size of mini-batch
  --debug               Debug mode
  --resume              Resume the same model from the last checkpoint. Order
                        of params are important. [for now]

To run:

$ THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32 python -u models/two_tier/two_tier.py --exp BEST_2TIER --n_frames 64 --frame_size 16 --emb_size 256 --skip_conn False --dim 1024 --n_rnn 3 --rnn_type GRU --q_levels 256 --q_type linear --batch_size 128 --weight_norm True --learn_h0 True --which_set MUSIC

SampleRNN (3-tier)

$ python models/three_tier/three_tier.py -h
usage: three_tier.py [-h] [--exp EXP] --seq_len SEQ_LEN --big_frame_size
                     BIG_FRAME_SIZE --frame_size FRAME_SIZE --weight_norm
                     WEIGHT_NORM --emb_size EMB_SIZE --skip_conn SKIP_CONN
                     --dim DIM --n_rnn {1,2,3,4,5} --rnn_type {LSTM,GRU}
                     --learn_h0 LEARN_H0 --q_levels Q_LEVELS --q_type
                     {linear,a-law,mu-law} --which_set {ONOM,BLIZZ,MUSIC}
                     --batch_size {64,128,256} [--debug] [--resume]

three_tier.py No default value! Indicate every argument.

optional arguments:
  -h, --help            show this help message and exit
  --exp EXP             Experiment name
  --seq_len SEQ_LEN     How many samples to include in each Truncated BPTT
                        pass
  --big_frame_size BIG_FRAME_SIZE
                        How many samples per big frame in tier 3
  --frame_size FRAME_SIZE
                        How many samples per frame in tier 2
  --weight_norm WEIGHT_NORM
                        Adding learnable weight normalization to all the
                        linear layers (except for the embedding layer)
  --emb_size EMB_SIZE   Size of embedding layer (> 0)
  --skip_conn SKIP_CONN
                        Add skip connections to RNN
  --dim DIM             Dimension of RNN and MLPs
  --n_rnn {1,2,3,4,5}   Number of layers in the stacked RNN
  --rnn_type {LSTM,GRU}
                        GRU or LSTM
  --learn_h0 LEARN_H0   Whether to learn the initial state of RNN
  --q_levels Q_LEVELS   Number of bins for quantization of audio samples.
                        Should be 256 for mu-law.
  --q_type {linear,a-law,mu-law}
                        Quantization in linear-scale, a-law-companding, or mu-
                        law compandig. With mu-/a-law quantization level shoud
                        be set as 256
  --which_set {ONOM,BLIZZ,MUSIC}
                        ONOM, BLIZZ, or MUSIC
  --batch_size {64,128,256}
                        size of mini-batch
  --debug               Debug mode
  --resume              Resume the same model from the last checkpoint. Order
                        of params are important. [for now]

To run:

$ THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32 python -u models/three_tier/three_tier.py --exp BEST_3TIER --seq_len 512 --big_frame_size 8 --frame_size 2 --emb_size 256 --skip_conn False --dim 1024 --n_rnn 1 --rnn_type GRU --q_levels 256 --q_type linear --batch_size 128 --weight_norm True --learn_h0 True --which_set MUSIC

Reference

If you are using this code, please cite the paper.

@article{mehri2016samplernn, Author = {Soroush Mehri and Kundan Kumar and Ishaan Gulrajani and Rithesh Kumar and Shubham Jain and Jose Sotelo and Aaron Courville and Yoshua Bengio}, Title = {SampleRNN: An Unconditional End-to-End Neural Audio Generation Model}, Year = {2016}, Journal = {arXiv preprint arXiv:1612.07837}, }

Torch implementation

Thanks to Richard Assar, now we have a Torch implementation available:

https://github.com/richardassar/SampleRNN_torch

Miscellaneous

Talk by Yoshua Bengio at CBMM, MIT: Deep Generative Models for Speech and Images
Follow-up project: Char2Wav: End-To-End Speech Synthesis

If needed or have interesting related project/results, please don't hesitate to contact us.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
datasets		datasets
lib		lib
models		models
.gitignore		.gitignore
Fig1.png		Fig1.png
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SampleRNN

Dependencies

Datasets

Training

SampleRNN (2-tier)

SampleRNN (3-tier)

Reference

Torch implementation

Miscellaneous

About

Releases

Packages

Contributors 3

Languages

License

soroushmehr/sampleRNN_ICLR2017

Folders and files

Latest commit

History

Repository files navigation

SampleRNN

Dependencies

Datasets

Training

SampleRNN (2-tier)

SampleRNN (3-tier)

Reference

Torch implementation

Miscellaneous

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages