This package provides training and evaluation code for the end-to-end baseline for the 2017 ComParE challenges.
The 1st challenge comprises recordings of child/adult and adult/adult conversations – the task is to determine the addressee (child or adult by an adult). Recordings of the near field of 61 individuals (babies) were made with the LENA recording device in real homes. (The number of actual speakers is unknown.) Overall, ~11,000 (10886) samples (segments) totalling up to 288 minutes are included.
The 2nd challenge comprises recordings of individuals – the task is to determine whether the person speaking is cold or not. The number of actual speakers is 630 (382 males, 248 females), with age ranging from 12 to 84 years old. Overall, the corpus consists of ~11,000 (11283) audio recordings.
The 3rd challenge comprises recordings of snore sounds of individuals by their excitation location within the upper airways. The task is to classify 4 different types of snoring, which are defined based on the VOTE scheme. There are 843 snore events from 224 subjects.
For questions about these models please contact: George Trigeorgis or Panayiotis Tzirakis
If you use this codebase in your experiments please cite:
[George Trigeorgis, Fabien Ringeval, Raymond Brückner, Erik Marchi, Mihalis Nicolaou, Björn Schuller, and Stefanos Zafeiriou, “Adieu Features? Endto-End Speech Emotion Recognition using a Deep Convolutional Recurrent Network,” in Proceedings 41st IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2016, (Shanghai, P. R. China), pp. 5200–5204, IEEE, 2016.] (https://ibug.doc.ic.ac.uk/media/uploads/documents/learning_audio_paralinguistics_from_the_raw_waveform.pdf)
We highly recommended to use conda as your Python distribution. Once downloading and installing conda, this project can be installed by:
Step 1: Create a new conda environment and activate it:
$ conda create -n compare python=3.5
$ source activate compare
Step 2: Install TensorFlow v.012 following the official installation instructions. For example, for 64-bit Linux, the installation of GPU enabled, Python 3.5 TensorFlow involves:
(compare)$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.0-cp35-cp35m-linux_x86_64.whl
(compare)$ pip install --upgrade $TF_BINARY_URL
Step 4: Clone and install the compare
project as:
(compare)$ git clone git@github.com:trigeorgis/ComParE2017.git
We use a convolutional-recurrent architecture which is comprised of convolutional networks which extract features of the raw waveform, and an LSTM network which takes these features and classifies the whole sequence as one of the classes in the datasets.
The waveform is split in 40ms chunks and for each of these we extract features and then we use a recurrent network to traverse the whole sequence. At the end we are left with the hidden state from the LSTM network which we use to do the final classification.
There are two options to use the input data to run experiments.
The first is to convert the original wave files in a format more suitable for TensorFlow using TF Records.
Addressee (First Challenge)
(compare)$ python data_generator.py --wave_folder=path/to/wave_folder --arff_path=ComParE2017_Addressee\* --tf_folder=tf_records
Cold (Second Challenge)
(compare)$ python data_generator.py --wave_folder=path/to/wave_folder --arff_path=ComParE2017_Cold\* --tf_folder=tf_records
Snore (Third Challenge)
(compare)$ python data_generator.py --wave_folder=path/to/wave_folder --arff_path=ComParE2017_Snore.ComParE\* --tf_folder=tf_records
By default the tfrecords
will be generated in a folder called tf_records
which
containts a file for each dataset split (train
, devel
, test
).
Addressee (First Challenge)
(compare)$ python compare_train.py --task=addressee --train_dir=ckpt/train_addressee
Cold (Second Challenge)
(compare)$ python compare_train.py --task=cold --train_dir=ckpt/train_cold
Snore (Third Challenge)
(compare)$ python compare_train.py --task=snore --train_dir=ckpt/train_snore
The training script accepts the following list of arguments.
--initial_learning_rate INITIAL_LEARNING_RATE
Initial learning rate.
--batch_size BATCH_SIZE
The batch size to use.
--num_lstm_modules How many LSTM modules to use.
--train_dir TRAIN_DIR
Directory where to write event logs and checkpoint.
--pretrained_model_checkpoint_path PRETRAINED_MODEL_CHECKPOINT_PATH
If specified, restore this pretrained model before
beginning any training.
--max_steps MAX_STEPS
Number of batches to run.
--train_device TRAIN_DEVICE
Device to train with.
--model MODEL Which model is going to be used: audio,video, or both
--dataset_dir DATASET_DIR
The tfrecords directory.
--task TASK The task to execute. `addressee`, `cold`, or `snore`.
--portion PORTION Dataset portion to use for training (train or devel).
While training the models it is useful to run an evaluator service to do continueous
(compare)$ python compare_eval.py --task=(addressee or cold or snore) --checkpoint_dir=ckpt/train
TensorBoard: You can simultaneously run the training and validation. The results can be observed through TensorBoard. Simply run:
(compare)$ tensorboard --logdir=ckpt
This makes it easy to explore the graph, data, loss evolution and accuracy on the validation set. Once you have a models which performs well on the validation set (which can take between 10k-70k steps depending on the dataset) you can stop the training process.