This project focuses on music classification using mel-spectrograms, specifically targeting music genre and mood/theme recognition. Spectrograms, visual representations of audio data, are fed into deep learning models for classification. The project utilises popular convolutional neural networks (CNNs) and explores the effectiveness of transfer learning with pre-trained models.
Bogdanov, D., Won M., Tovstogan P., Porter A., & Serra X. (2019). The MTG-Jamendo Dataset for Automatic Music Tagging. Machine Learning for Music Discovery Workshop, International Conference on Machine Learning (ICML 2019).
- TensorFlow
- Keras
- Matplotlib
- NumPy
- Pandas
- Librosa
Dataset | Train | Test | Validation | Total |
---|---|---|---|---|
Mood/Theme | 11,114 | 3,699 | 3,673 | 18,486 |
Genre | 33,169 | 11,047 | 10,999 | 55,215 |
Model Architecture | Top-4 accuracy | ROC-AUC | PR-AUC | F-Score |
---|---|---|---|---|
VGG19 | 0.590024 | 0.886725 | 0.247987 | 0.108169 |
EfficientNetV2-B0 | 0.568208 | 0.893232 | 0.216523 | 0.067312 |
Xception | 0.604870 | 0.874622 | 0.274456 | 0.145906 |
DenseNet-201 | 0.626052 | 0.902065 | 0.288641 | 0.119026 |
Model Architecture | Top-4 accuracy | ROC-AUC | PR-AUC | F-Score |
---|---|---|---|---|
VGG19 | 0.407948 | 0.815694 | 0.133618 | 0.100799 |
EfficientNetV2-B0 | 0.273317 | 0.770294 | 0.060965 | 0.015179 |
Xception | 0.396593 | 0.783322 | 0.131362 | 0.119081 |
DenseNet-201 | 0.425520 | 0.806734 | 0.147263 | 0.116122 |