This GitHub project includes PyTorch implementation for reproducing experiments and DNN models used in the paper DcaseNet: An integrated pretrained deep neural network for detecting and classifying acoustic scenes and events, accepted for presentation at IEEE ICASSP 2021.
DcaseNet is a DNN which jointly performs acoustic scene classification (ASC), audio tagging (TAG), and sound event detection (SED) simultaneously. It adopts a two-phase training. In the first phase, joint training of three tasks is performed. Then, the model is fine-tuned for each task.
We used Nvidia GPU Cloud for conducting our experiments. The training was done using one Nvidia Titan RTX GPU. Our settings are available at launch_nvidia-gpu-cloud.sh
- Download three datasets: DCASE 2020 challenge Task 1-a, DCASE 2019 challenge Task 2, and DCASE 2020 challenge Task 3 and configure directories.
- (selectively) Enter virtual environment using NGC.
- Set parameters in train.sh
- run train.sh
If you prefer to use pre-trained joint DcaseNet and fine-tune only, remove 'Joint' experiment on train.sh and copy Joint weights into your 'save_dir'
- Download three datasets: DCASE 2020 challenge Task 1-a, DCASE 2019 challenge Task 2, and DCASE 2020 challenge Task 3 and configure directories.
- Set parameters in evaluate_trained_models.sh
- Run evaluate_trained_models.sh
There's a simple GUI program in DCASENetShellScriptBuilder that generates a script that one can run on Windows OS. After configuring a few checkboxes and setting directories for datasets, the generated script trains and evaluates. This program is provided by yeongsoo, and no further maintenance will be done.
The program has three rows: (i) On which tasks will the user conduct joint training (By checking none, it will use pretrained DcaseNet using all three tasks) (ii) On which tasks to perform fine-tuning (checking more than one task will train separate DcaseNets for each fine-tune task) (recommended to should check at least on task) (iii) On which tasks to perform the evaluation (recommended to be the same with upper row)
Below, there are text boxes where one can set directories of the downloaded datasets and save trained models. Note that when setting dataset directories, the code in this repo expects the folder that comes out after unzipping it.
Email jeewon.leo.jung@gmail.com for other details :-).
This repository provides the code for reproducing the below paper.
@inproceedings{jung2021dcasenet,
title={DCASENet: An integrated pretrained deep neural network for detecting and classifying acoustic scenes and events},
author={Jung, Jee-weon and Shim, Hye-jin and Kim, Ju-ho and Yu, Ha-Jin},
booktitle={Proc. ICASSP},
pages={621--625},
year={2021},
organization={IEEE}
}
- 2020.09.24. : Initial commit
- 2020.10.18. : Overall validation & refactoring (thanks to yeongsoo)
- 2020.11.04. : Added filetrees & Refactoring finish