Skip to content

flckv/SpeechPrompt-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpeechPrompt-v2

🐘 Pre-trained models and files

There are 4 files you will be having:

  1. HuBERT model: encoding speech
  2. K-means model: quantizing the speech representations into discrete units
  3. dictionary file: defining the unit space for the unit language model.
  4. unit Language Model (uLM): performing generative language modeling on the disrete units

These models can be automatically downloaded when running preprocessing pipeline.

🔧 Preprocessing

Concept

  • There are 4 steps in the data preprocess (Speech2unit) pipline. The main task here is to perform speech-to-units and collating the task labels

    1. generate manifest
    2. quantize
    3. reduce_quantized
    4. create_lm_dataset
  • We save intermediate data in each step so that we can do further analysis on the data that we are interested in. Also, you can better understand how it works by checking each intermediate data.

Steps

  1. Download the dataset
  2. Modify the dataset config ([downstream]/config.yaml)
  3. Modify the global config (preprocess/config.yaml)
  4. Run Preporcess/runner.py
    • option 1
    # You can run --action all to run through all the 4 stages:
    python runner.py --model GSLM --downstream SCR_google_speech_commands --action all
    • option 2
    # Or you can run through these 4 stages sequentially by the following command:
    python runner.py --model GSLM --downstream SCR_google_speech_commands --action generate_manifest
    python runner.py --model GSLM --downstream SCR_google_speech_commands --action quantize
    python runner.py --model GSLM --downstream SCR_google_speech_commands --action reduce_quantized
    python runner.py --model GSLM --downstream SCR_google_speech_commands --action create_lm_dataset

🔄 Verbalizer

Concept

  • There are 2 steps in Verbalizer, which maps the task labels into language model's vocabulary.

Steps

  • run verbalizer.py
  • example:
    python verbalizer.py --downstream SCR_google_speech_commands --action all --method freq

🐟 Fairseq Preprocess

Concept

This step converts the verbalized data to binary files that will be used for fairseq training.

Steps

  • run fairseq_preprocess.py
  • example:
    python fairseq_preprocess.py --downstream SCR_google_speech_commands --vb_method freq

🔥 Training

Concept

  • During training, 2 kinds of checkpoints will be saved
    • base_model
    • prompt

steps

  • run train.py
  • example:
    python train.py \
        --downstream SCR_google_speech_commands \
        --vb_method freq \
        --exp_name SCR_google_speech_commands_plen.5 \
        --prompt_length 5 \
        --deep_prompt

✒️ Sampling

Concept

  • Load base_model and prompts to perform sampling

Steps

  • run sample.py
  • example:
    python sample.py \
        --exp_name SCR_google_speech_commands_plen.5 \
        --downstream SCR_google_speech_commands \
        --vb_method freq
  • The output is a json file containing the file_name, source units, ground truth (label), and model prediction:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages