Skip to content

Latest commit

 

History

History
171 lines (131 loc) · 8.02 KB

dataset.md

File metadata and controls

171 lines (131 loc) · 8.02 KB

SpeechPrompt-v2 Dataset Document

📚 Speech Command Recognition (SCR)

📚 Google Speech Commands v1

(Follow the download instructions from s3prl)

📚 Grabo

📚 Lithuanian Speech Commands (LT-SCR)

📚 Dysarthric Mandarin Speech Commands (DM-SCR)

📚 Arabic Speech Commands (AR-SCR)

📚 Intent Classification (IC)

(Follow the download instructions from s3prl)

📚 Language Identification (LID)

(Follow the download instructions from Tensorflow)

📚 Fake Speech Detection (FSD)

📚 Emotion Recognition (ER)

(Follow the download instructions from s3prl)

📚 Accent Classification (AcC)

📚 Speaker Identitfication (SID)

(Follow the download instructions from s3prl)

  • Download dataset from Voxceleb1 and unzip them.

    voxceleb1_root="/CORPORA_DIR/VoxCeleb1/"
    mkdir -p $voxceleb1_root/dev
    mkdir -p $voxceleb1_root/test
    
    # prepare dev
    cd $voxceleb1_root/dev/
    wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa
    wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partab
    wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partac
    wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partad
    cat vox1_dev* > vox1_dev_wav.zip
    unzip vox1_dev_wav.zip
    
    # prepare test
    cd $voxceleb1_root/test/
    wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_test_wav.zip
    unzip vox1_test_wav.zip
    

📚 Gender Identification (GID)

📚 Audio Classification (AuC)

📚 Sarcasm Detection (SD)

📚 MUStARD Dataset

📚 MUStARD++ Dataset

📚 Voice Activity Detection (VAD)

Follow the download instructions from NVIDIA Nemo

  • Download NeMo's Github repo
  • Go to <NeMo_git_root>/scripts/freesound_download_resample/ and follow the below steps to download Freesound (These steps are originated from freesound_download.py)
    • Install required packages
       pip install -r freesound_requirements.txt
      
    • Create an API key for freesound.org at https://freesound.org/help/developers/
    • Create a python file called freesound_private_apikey.py and add lined
     api_key = <your Freesound api key> 
     client_id = <your Freesound client id>
    
    • Authorize by run python freesound_download.py --authorize and visit the website and paste response code
    • Feel free to change any arguments in download_resample_freesound.sh such as max_samples and max_filesize
    • Run bash download_resample_freesound.sh .
      bash download_resample_freesound.sh 4000 ./freesound ./freesound_resampled_background
      
  • Download Google Speech Commands Dataset v2
  • Process Google SC v2 & Freesound dataset
    • Modify line 484 in <NeMo_git_root>/scripts/dataset_processing/process_vad_data.py to fixed_test, fixed_val, fixed_train = 60000, 20000, 160000 and run the following command:
      python <NeMo_git_root>/scripts/dataset_processing/process_vad_data.py --out_dir='./manifest/' --speech_data_root='./speech_commands_v0.02'--background_data_root='./freesound_resampled_background' --log --rebalance_method='fixed'