1st Place Public Leaderboard Solution for ERC2019

Overview

Although the competition is for recognizing emotion from audio data. You can treat this codebase for a baseline for audio classification in general, I did not make any assumption about the provided data. This is is pipeline

Key features

Preprocessing by converting the audio to Mel Spectrogram.

I used librosa with this config:

sampling_rate = 16000
duration = 2 # sec
hop_length = 125 * duration
n_mels = 128

Basically, each 128x128 image represents 2 second of audio.

Fully CNN for audio classification

Recently we won a gold medal in Kaggle's Freesound Audio Tagging 2019 and thus most of the architectures were borrowed from there.

The final submission was an ensemble of 4 models. 3 of them were Classifier_M0, Classifier_M2 and Classifier_M3 from our technical report [1]:

Here's what Classifier_M3 looks like:

The other model came from the 7th place solution

Mixup + SpecAugment (SpecMix)

The most important part of this solution is the augmentation method, as the dataset is very small and pretraining is not allowed.

Augmenting options spectrogram are very limited due to the nature of the data (they are not ordinary images e.g rotating a spectrogram makes no sense). In this work I ultilized Mixup [2] and SpecAugment [3]. This repo by Eric Bouteillon showed a nice explantion of the method:

Training

Preprocessing

To reproduce the Mels data, run the following command:

$python preprocess.py --train_df_path <path-to>/train_label.csv --train_dir  <path-to>/Train --test_dir  <path-to>/Public_Test --train_output_path ./data/mels_train.pkl --test_output_path ./data/mels_test.pkl

Training

To reproduce the models, run the following commands:

$python train_pred.py --train_df_path <path-to>/train_label.csv --test_dir <path-to>/Public_Test/ --model m0 --logdir models_m0 --output_name preds_m0.npy

$python train_pred.py --train_df_path <path-to>/train_label.csv --test_dir <path-to>/Public_Test/ --model m2 --logdir models_m2 --output_name preds_m2.npy

$python train_pred.py --train_df_path <path-to>/train_label.csv --test_dir <path-to>/Public_Test/ --model m3 --logdir models_m3 --output_name preds_m3.npy

$python train_pred.py --train_df_path <path-to>/train_label.csv --test_dir <path-to>/Public_Test/ --model dcase --logdir models_dcase --output_name preds_dcase.npy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

1st Place Public Leaderboard Solution for ERC2019

Overview

Key features

Preprocessing by converting the audio to Mel Spectrogram.

Fully CNN for audio classification

Mixup + SpecAugment (SpecMix)

Training

Preprocessing

Training

Files

README.md

Latest commit

History

README.md

File metadata and controls

1st Place Public Leaderboard Solution for ERC2019

Overview

Key features

Preprocessing by converting the audio to Mel Spectrogram.

Fully CNN for audio classification

Mixup + SpecAugment (SpecMix)

Training

Preprocessing

Training