Skip to content

egorsmkv/xeus-finetune

Repository files navigation

xeus-finetune

Warning

Currently, this work is in progress.

This repository contains training code for the XEUS model for Automatic Speech Recognition (ASR). This is a fork of https://github.com/pashanitw/xeus-finetune

Required software

  • python3.11, python3.11-dev
  • build-essential, cmake
  • uv
  • git-lfs

Note

Python 3.12 cannot be used because one of the dependencies in ESPnet relies on an old package.

Install

uv venv --python 3.11

source .venv/bin/activate

# install espnet
git clone --branch ssl --depth 1 https://github.com/wanchichen/espnet espnet-code
cd espnet-code
git fetch --unshallow
uv pip install -e .

# download XEUS checkpoint
git clone https://huggingface.co/espnet/XEUS

# install required packages
uv pip install -r requirements.txt

# in development mode install additional packages
uv pip install -r requirements-dev.txt

Fine-tuning

  1. Authenticate with HF
huggingface-cli login
  1. Copy a config file, change dataset sections and hparams
cp configs/hi_hf.yaml configs/uk_hf.yaml
  1. Start fine-tuning
accelerate launch finetune.py --config configs/uk_hf.yaml

# if you want to use only one GPU
accelerate launch --num_processes 1 finetune.py --config configs/uk_hf.yaml

Inference

python inference.py --ckpt_path <checkpoint path> --audio audio.wav

# example
python inference.py --ckpt_path ./step_2000 --audio audio.wav

Evaluation

Run the following command to calculate Word Error Rate:

python eval.py --ckpt_path <checkpoint path> --dataset <dataset> --name <subset> --split <split>

# example
python eval.py --ckpt_path ./step_2000 --dataset mozilla-foundation/common_voice_17_0 --name uk --split test

Development

Check and format the code:

ruff check
ruff format

TODO

  • Enable Flash-Attention for training
  • Set a cache_dir for load_dataset