UETASR

An Automatic Speech Recognition toolkit in TensorFlow 2

Suggestions are always welcome!

Key features

UETASR provides various useful tools to speed up and facilitate research on speech technologies:

A YAML-based hyperparameter specification language that describes all types of hyperparameters, from individual numbers to complete objects.
Single and Multi-GPUs training and inference with TensorFlow 2 Data-Parallel or Distributed Data-Parallel.
A transparent and entirely customizable data input and output pipeline, enabling users to customize the I/O pipelines.
Logging and visualization with WandB and TensorBoard.
Error analysis tools to help users debug their models.

Supported Models

CTC and Transducer architectures with any encoders (and decoders) can be plugged into the framework.
Gradient Accumulation for large batch training is supported.
Currently supported:
- Conformer (https://arxiv.org/abs/2005.08100)
- Emformer (https://arxiv.org/abs/2010.10759)

Feature extraction and augmentation

UETASR provides efficient and GPU-friendly on-the-fly speech augmentation pipelines and acoustic feature extraction:

Augmentation:
- Adaptive SpecAugment (paper)
- SpliceOut (paper)
- Gain, Time Stretch, Pitch Shift, etc. (paper)
Featurization:
- MFCC, Fbank, Spectrogram, etc.
- Subword tokenization (BPE, Unigram, etc.)

Installation

For training and testing, you can use git clone to install some optional packages from other authors (ctc_loss, rnnt_loss, etc.)

Prerequisites

TensorFlow >= 2.9.0
CuDNN >= 8.1.0
CUDA >= 11.2
Nvidia driver >= 470

Install with GitHub

Once you have created your Python environment (Python 3.6+) you can simply type:

git clone https://github.com/thanhtvt/uetasr.git
cd uetasr
pip install -e .

Then you can access uetasr with:

import uetasr

Install with Conda

git clone https://github.com/thanhtvt/uetasr.git

conda create --name uetasr python=3.8
conda activate uetasr
conda install cudnn=8.1.0

cd uetasr

pip install -e .

Install with Docker

Build docker from Dockerfile:

docker build -t uetasr:v1.0.0 .

Run container from uetasr image:

docker run -it --name uetasr --gpus all -v <workspace_dir>:/workspace uetasr:v1.0.0 bash

Getting Started

Define config YAML file, see the config.yaml file this folder for reference.
Download your corpus and create a script to generate the .tsv file (see this file for reference). Check our provided tools whether they meet your need.
Create transcript.txt and cmvn.tsv files for your corpus. We implement this script to generate those files, knowing the .tsv file generated in step 2.
For training, check train.py in the egs folder to see the options.
For testing, check test.py in the egs folder to see the options.
For evaluating and error analysis, check asr_evaluation.py in the tools folder to see the options.
[Optional] To publish your model on 🤗, check this space for reference.

References & Credits

namnv1906 (for the guidance & initial version of this toolkit)
TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2
ESPNet: End-to-End Speech Processing Toolkit
SpeechBrain: A PyTorch-based Speech Toolkit
Python module for evaluting ASR hypotheses
Accumulated Gradients for TensorFlow 2

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
docs/images		docs/images
egs		egs
tools		tools
uetasr		uetasr
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project_root		.project_root
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UETASR

An Automatic Speech Recognition toolkit in TensorFlow 2

Key features

Supported Models

Feature extraction and augmentation

Installation

Prerequisites

Install with GitHub

Install with Conda

Install with Docker

Getting Started

References & Credits

About

Releases 4

Packages

Languages

License

thanhtvt/uetasr

Folders and files

Latest commit

History

Repository files navigation

UETASR

An Automatic Speech Recognition toolkit in TensorFlow 2

Key features

Supported Models

Feature extraction and augmentation

Installation

Prerequisites

Install with GitHub

Install with Conda

Install with Docker

Getting Started

References & Credits

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages