Skip to content

smellycloud/whatamilisteningto

Repository files navigation

whatamilisteningto

This project focuses on music classification using mel-spectrograms, specifically targeting music genre and mood/theme recognition. Spectrograms, visual representations of audio data, are fed into deep learning models for classification. The project utilises popular convolutional neural networks (CNNs) and explores the effectiveness of transfer learning with pre-trained models.

Dataset source

Bogdanov, D., Won M., Tovstogan P., Porter A., & Serra X. (2019). The MTG-Jamendo Dataset for Automatic Music Tagging. Machine Learning for Music Discovery Workshop, International Conference on Machine Learning (ICML 2019).

Dependencies:

  • TensorFlow
  • Keras
  • Matplotlib
  • NumPy
  • Pandas
  • Librosa

Track count

Dataset Train Test Validation Total
Mood/Theme 11,114 3,699 3,673 18,486
Genre 33,169 11,047 10,999 55,215

Genre Classification Metrics

Model Architecture Top-4 accuracy ROC-AUC PR-AUC F-Score
VGG19 0.590024 0.886725 0.247987 0.108169
EfficientNetV2-B0 0.568208 0.893232 0.216523 0.067312
Xception 0.604870 0.874622 0.274456 0.145906
DenseNet-201 0.626052 0.902065 0.288641 0.119026

Mood/Theme Classification Metrics

Model Architecture Top-4 accuracy ROC-AUC PR-AUC F-Score
VGG19 0.407948 0.815694 0.133618 0.100799
EfficientNetV2-B0 0.273317 0.770294 0.060965 0.015179
Xception 0.396593 0.783322 0.131362 0.119081
DenseNet-201 0.425520 0.806734 0.147263 0.116122

Genre Predictions

preds_g1 preds_g2 preds_g3 preds_g4

Mood/Theme Predictions

preds_m1 preds_m6 preds_m2 preds_m3