PyTorch implementation of cross-cultural music transfer learning using auto-tagging models.
From West to East: Who can understand the music of the others better?, ISMIR 2023.
- Charilaos Papaioannou, Emmanouil Benetos, and Alexandros Potamianos
- MagnaTagATune
- audio: the files can be downloaded from the dataset webpage.
- metadata: the (binary) labels for the top-50 tags are stored in
data/magnatagatune/split/binary.npy
and they can be also found at minzwon/sota-music-tagging-models repository.
- FMA
- audio: the files can be downloaded from the dataset repository for the FMA-medium dataset that is used in this study.
- metadata: the top-20 hierarchically related genre labels (available in dataset repository) were utilized to create the
data/fma/split/metadata.npy
file which contains a dictionary where keys are the track ids and values are the repective labels.
- Lyra
- audio: the audio files are not publicly available but the mel-spectrograms can be downloaded from the dataset repository.
- metadata: they have been copied to the directory
data/lyra/split/
from the dataset repository.
- Turkish-makam, Hindustani, Carnatic
- These datasets are part of the CompMusic Corpora. One should create an account to Dunya and request access to the audio files.
- The pycompmusic has to be installed and the user
token
must be used at the helper scripts underpreprocessing/dunya/
directory. In order to fetch both data and metadata, set thedataset
option properly (one of'makam', 'hindustani', 'carnatic'
) and execute the scripts such as:
python preprocessing/dunya/get_audios.py --dataset 'makam'
python preprocessing/dunya/get_metadata.py --dataset 'carnatic'
The metadata are supposed to be under data/{dataset}/split/
directory at the files binary.npy
for MagnaTagATune, training.tsv
and test.tsv
for Lyra and metadata.npy
for the rest of the datasets.
Regarding splits, the analogies 0.7/0.1/0.2 for training, validation and test sets are used across all datasets. Specifically:
- MagnaTagATune: the publicly available split is used. It can be found in at several repositories, e.g. here.
- Lyra: The publicly available split for training and test sets is used. The validation set is randomly splitted from the training set during training.
- For the rest of the datasets - FMA-medium, Turkish-makam, Hindustani and Carnatic - random split was applied and the result was stored in the files
train.npy
,valid.npy
andtest.npy
under eachdata/{dataset}/split
directory, for reproducibility. Each one of those files contains a list of ids that is used, in turn, by the respective dataloader.
- Python 3.8 or later
- Create virtual environment and install requirements
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
The audio (mp3) files of each dataset are expected to be found under audios/
directory at a specific path (let's say {data_dir}
) such as /__path_to__/data/{dataset}/
.
In order to create the mel-spectrograms
for all datasets except Lyra (for which they can be readily downloaded), the following command can be used by setting properly the dataset
(use one of 'magnatagatune', 'fma', 'makam', 'hindustani', 'carnatic'
) and data_dir
(full path to the dataset directory) options, such as:
python preprocessing/create_mel_spectrograms.py --dataset 'magnatagatune' --data_dir '/__path_to__/magnatagatune'
The mel-spectrograms will be generated under the respective {data_dir}/mel-spectrograms/
directory and they will follow the {id}.npy
naming convention.
The available deep learning models are VGG-ish, Musicnn and Audio Spectrogram Transformer (AST).
Specify:
{dataset}
as one of'magnatagatune', 'fma', 'makam', 'lyra', 'hindustani', 'carnatic'
{data_dir}
where themel-spectrograms
andsplit
dirs are expected to be found{model_name}
which will load the respective configuration fromMODELS_CONFIG
atconfig.py
- use one of
'vgg_ish', 'musicnn', 'ast'
for models that no transfer learning is taking place
- use one of
{device}
to be used (one of'cpu', 'cuda:0', 'cuda:1'
etc.)
example:
python train.py --dataset 'magnatagatune' --data_dir '/__path_to__/magnatagatune' --model_name 'ast' --device 'cuda:0'
Once training of a model has been completed on a dataset, it can be used to apply transfer learning to another by using the following conventions for the {model_name}
option:
{model}_from_{dataset}
when fine-tuning on the whole network will be applied{model}_from_{dataset}_f
when fine-tuning only of the final layer will take place
Assuming that a vgg_ish
model is trained on fma
and one wishes to transfer it to hindustani
and fine-tune the whole network, the command will be:
python train.py --dataset 'hindustani' --data_dir '/__path_to__/hindustani' --model_name 'vgg_ish_from_fma' --device 'cuda:0'
In order to transfer a musicnn
model from 'lyra'
to 'makam'
dataset and fine-tune only the final layer, one would run:
python train.py --dataset 'makam' --data_dir '/__path_to__/makam' --model_name 'musicnn_from_lyra_f' --device 'cuda:0'
For evaluating a model, the same options need to be specified, i.e. {dataset}
, {data_dir}
, {model_name}
and {device}
.
example for a single-domain model:
python evaluate.py --dataset 'fma' --data_dir '/__path_to__/fma' --model_name 'ast' --device 'cuda:0'
or for a transfer learning model:
python evaluate.py --dataset 'lyra' --data_dir '/__path_to__/lyra' --model_name 'musicnn_from_magnatagatune' --device 'cuda:0'
The result will be stored under evaluation/{dataset}/
directory to a {model_name}.txt
file. The evaluation results of the single-domain and the best transfer learning models are being provided for reference purposes.