This repository contains code of a script that recognizes your speech using wav2vec2 models.
Clone this repository:
git clone https://github.com/egorsmkv/test-wav2vec2-by-microphone
cd test-wav2vec2-by-microphone
Install Python requirements:
# the author has successfully tested the project with wave=0.0.2, torch==1.11.0, torchaudio==0.11.0, sox==1.4.1, and pyaudio==0.2.11 pyctcdecode==0.3.0 transformers==4.19.2
pip install https://github.com/kpu/kenlm/archive/master.zip
pip install wave torch torchaudio pyaudio sox pyctcdecode transformers
brew install portaudio sox
pip install https://github.com/kpu/kenlm/archive/master.zip
pip install wave pyctcdecode transformers
pip install --global-option='build_ext' --global-option='-I/usr/local/include' --global-option='-L/usr/local/lib' pyaudio
To install torch and torchaudio on MacOS you need to install conda or miniconda (I recommend it) and then install torch libraries:
For Intel:
conda install pytorch torchaudio -c pytorch
For M1:
pip3 install torch torchaudio
If you have problems with installation of pyaudio, then check out this link. For me below command works:
pip3 install --global-option='build_ext' --global-option='-I/opt/homebrew/Cellar/portaudio/19.7.0/include/' --global-option='-L/opt/homebrew/Cellar/portaudio/19.7.0/lib/' pyaudio
# Run the loop (this script will record speech and recognizes it)
# Use Ctrl-C to stop the script
python run.py --model_id Yehor/wav2vec2-xls-r-300m-uk-with-small-lm --record_seconds 15
- If you have any issues - create an issue in the repository
- Currently tested on Linux and MacOS, for Windows you need to change the script slightly