torchaudio based wav2vec2 with no model input length limit #141

jeffxtang · 2021-05-28T00:41:30Z

Speech Recognition on Android with Wav2Vec2

Introduction

Facebook AI's wav2vec 2.0 is one of the leading models in speech recognition. It is also available in the Huggingface Transformers library, which is also used in another PyTorch Android demo app for Question Answering.

In this demo app, we'll show how to quantize, trace, and optimize the wav2vec2 model, powered by the newly released torchaudio 0.9.0, and how to use the converted model on an Android demo app to perform speech recognition.

Prerequisites

PyTorch 1.9.0 and torchaudio 0.9.0 (Optional)
Python 3.8 (Optional)
Android PyTorch library 1.9.0
Android Studio 4.0.1 or later

Quick Start

1. Get the Repo

Simply run the commands below:

git clone https://github.com/pytorch/android-demo-app
cd android-demo-app/SpeechRecognition

If you don't have PyTorch 1.9.0 and torchaudio 0.9.0 installed or want to have a quick try of the demo app, you can download the quantized scripted wav2vec2 model file here, then drag and drop it to the app/src/main/assets folder inside android-demo-app/SpeechRecognition, and continue to Step 3.

2. Prepare the Model

To install PyTorch 1.9.0, torchaudio 0.9.0 and the Hugging Face transformers, you can do something like this:

conda create -n wav2vec2 python=3.8.5
conda activate wav2vec2
pip install torch torchaudio
pip install transformers

Now with PyTorch 1.9.0 and torchaudio 0.9.0 installed, run the following commands on a Terminal:

python create_wav2vec2.py

This will create the model file wav2vec2.pt. Copy it to the Android app:


mkdir -p app/src/main/assets
cp wav2vec2.pt app/src/main/assets

2. Build and run with Android Studio

Start Android Studio, open the project located in android-demo-app/SpeechRecognition, build and run the app on an Android device. After the app runs, tap the Start button and start saying something; after 12 seconds (you can change private final static int AUDIO_LEN_IN_SECOND = 12; in MainActivity.java for a shorter or longer recording length), the model will infer to recognize your speech. Some example recognition results are:

This reverts commit 5a65775.

…rding length limit; android code update

…o 0.9.0

jeffxtang added 15 commits November 19, 2020 18:22

initial commit

5a65775

Revert "initial commit"

0fe4660

This reverts commit 5a65775.

main readme and helloworld/demo app readme updates

724ebee

Merge branch 'master' of https://github.com/pytorch/android-demo-app

055b0ed

Merge branch 'master' into master_readme

f1bf35d

Merge branch 'jeffxtang-master_readme'

aeb41d2

Merge branch 'master' of https://github.com/pytorch/android-demo-app

c5b70cb

Merge branch 'master' of https://github.com/pytorch/android-demo-app

b2d277f

Merge branch 'master' of https://github.com/pytorch/android-demo-app

054b5ce

Merge branch 'master' of https://github.com/pytorch/android-demo-app

9091bf8

Merge branch 'master' of https://github.com/pytorch/android-demo-app

fab0135

Merge branch 'master' of https://github.com/jeffxtang/android-demo-app

40c326d

Merge branch 'master' of https://github.com/pytorch/android-demo-app

044302f

updated script to create torchaudio based wav2vec2 model with no reco…

bc8d9c5

…rding length limit; android code update

README update

482b2dd

facebook-github-bot added the cla signed label May 28, 2021

README update

2eb5ab4

mthrok mentioned this pull request Jun 10, 2021

Add deployment support wav2vec2.0 via torchaudio facebookresearch/fairseq#3609

Closed

updated script, build gradle and README for torch 1.9.0 and torchaudi…

7e61155

…o 0.9.0

jeffxtang marked this pull request as ready for review June 15, 2021 19:40

IvanKobzarev approved these changes Jun 16, 2021

View reviewed changes

IvanKobzarev merged commit 367d2d9 into pytorch:master Jun 16, 2021

Treshank mentioned this pull request Nov 12, 2021

Getting Lite Interpreter #202

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torchaudio based wav2vec2 with no model input length limit #141

torchaudio based wav2vec2 with no model input length limit #141

jeffxtang commented May 28, 2021 •

edited

Loading

torchaudio based wav2vec2 with no model input length limit #141

torchaudio based wav2vec2 with no model input length limit #141

Conversation

jeffxtang commented May 28, 2021 • edited Loading

Speech Recognition on Android with Wav2Vec2

Introduction

Prerequisites

Quick Start

1. Get the Repo

2. Prepare the Model

2. Build and run with Android Studio

jeffxtang commented May 28, 2021 •

edited

Loading