- Overview
- Installation
- Supported Dataset
- Usage
- Contact
Federated Learning has gained considerable interest in enabling multiple clients holding sensitive data to collaboratively train machine learning models without centralizing data. However, while many works have addressed the application of audio tasks with FL, few realistic FL datasets exist as benchmarks for algorithmic research. Therefore, we present FedAudio, a benchmark library for evaluating federated learning methods on audio tasks.
Paper Link: https://arxiv.org/pdf/2210.15707.pdf
The FedAudio package contains:
- FL feature managers to inject noise on top of the raw signal to simulate the realstic FL challenges.
- Pre-processing Manager supports not only conventional frequency-based features (e.g., Mel-frequency cepstral coefficients (MFCC), Mel Spectrograms) and knowledge-based speech features (e.g., pitch) but also deep audio representations from pre-trained models.
- Data Splitter includes various types of data partitioning.
- Baseline evaluation results and code with popular aggregation functions.
In our repo, we include FedML open source software (https://github.com/FedML-AI/FedML) as the federated learning framework to implement the FedAudio. Users could directly clone our repo and run the code without installing any other federated learning frameworks.
To begin with, please clone this repo and install the conda environment:
git clone https://github.com/zhang-tuo-pdf/FedAudio.git
cd FedAudio
conda env create -f fedspeech.yml
conda activate fedml
To make sure the fully functioning for the data downloading and processing, please install the below packages as well:
apt install sox
brew install git-lfs
Currently, FedAudio encompasses four audio datasets (Google Speech Command, IEMOCAP, CREMA-D, Urban Sound) with client partition, covering three common tasks, and accompanied by baseline training code and results. We would continue enriching the package with datasets and baseline from other applications to support more research scenarios. Please leave an message to us if you want to contribute or have some suggestions for the package development.
For most of the dataset, user could directly download by the given bash file. Take the Google Speech Command as instance:
cd data/speech_commands
bash download_audio.sh
For IEMOCAP dataset, please follow its offical website for the access.
https://sail.usc.edu/iemocap/iemocap_release.htm
In the FedAudio package, all the dataset loading would follow the below procedure:
raw data -> data split -> adding fl feature (optional) -> preprocess -> torch dataloader
The output from the data split phase would look like this: [key, wav_command, word_id].
For example: ['004ae714_nohash_1_zero', '../data/speech_commands/audio/zero/004ae714_nohash_1.wav', 34]
In the prepossessing data phase, FedAudio is compatible with OpenSMILE toolkit, SUPERB toolkit, and frequency-based feature extracting techniques such as mfcc and Mel Spectrograms. With the modular design, user could freely select the desired way for the feature extraction in the loader file. For example, for the Google Command dataset, the code for the preprocessing is below:
cd data_loading/data_loader
taskset 100 python gcommand_loader.py
Inside the gcommand_loader.py, user could change the variables such as 'process_method' to match customized needs. The loader file would generate a bin file to storage the processed data, so the user do not need to run the loader file everytime before running FL experiments.
Currenly, we support two different kinds of noisy, SNR noisy and label noisy.
To work with the SNR noise, please follow the steps below:
- In the loader.py file for each dataset (inside data_loading/data_loader), make the "fl_feature" to true and set the desired the noisy level in "db_level".
- In the main file (experiment/distributed_main.py), set the 'fl_feature' to true and 'db_level' to the same number in the data loading phase.
To work with the noisy noise, please follow the steps below:
- In the main file (experiment/distributed_main.py), set the 'label_nosiy' to true and 'label_nosiy_level' to the desired noisy level number. We do not need to reproduce the processed data bin file for the label noisy signals.
step 0: download the dataset
cd data/speech_commands
bash download_audio.sh
step 1: load the dataset
cd data_loading/data_loader
python gcommand_loader.py
step 2: run the fl training
cd experiment
sh run_distributed.sh gcommand raw mel_spec 106 8 5000 1 16 0.1 0 8
# dataset name, process_method, feature type, sampled client number, gpu number for running, round number, local epoch number, batch size, lr, start gpu, total gpu number in machine
If you like our works, please cite our paper in your publications.
@inproceedings{Zhang2022FedAudioAF,
title={FedAudio: A Federated Learning Benchmark for Audio Tasks},
author={Tuo Zhang and Tiantian Feng and Samiul Alam and Sunwoo Lee and Mi Zhang and Shrikanth S. Narayanan and Salman Avestimehr},
year={2022}
}
Feel free to contact us!
Tuo Zhang tuozhang@usc.edu
Tiantian Feng tiantiaf@usc.edu
Special Thanks to Samiul Alam!