This repository has been archived by the owner on Aug 28, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 611
torchaudio based wav2vec2 with no model input length limit #141
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This reverts commit 5a65775.
…rding length limit; android code update
IvanKobzarev
approved these changes
Jun 16, 2021
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Speech Recognition on Android with Wav2Vec2
Introduction
Facebook AI's wav2vec 2.0 is one of the leading models in speech recognition. It is also available in the Huggingface Transformers library, which is also used in another PyTorch Android demo app for Question Answering.
In this demo app, we'll show how to quantize, trace, and optimize the wav2vec2 model, powered by the newly released torchaudio 0.9.0, and how to use the converted model on an Android demo app to perform speech recognition.
Prerequisites
Quick Start
1. Get the Repo
Simply run the commands below:
If you don't have PyTorch 1.9.0 and torchaudio 0.9.0 installed or want to have a quick try of the demo app, you can download the quantized scripted wav2vec2 model file here, then drag and drop it to the
app/src/main/assets
folder insideandroid-demo-app/SpeechRecognition
, and continue to Step 3.2. Prepare the Model
To install PyTorch 1.9.0, torchaudio 0.9.0 and the Hugging Face transformers, you can do something like this:
Now with PyTorch 1.9.0 and torchaudio 0.9.0 installed, run the following commands on a Terminal:
This will create the model file
wav2vec2.pt
. Copy it to the Android app:2. Build and run with Android Studio
Start Android Studio, open the project located in
android-demo-app/SpeechRecognition
, build and run the app on an Android device. After the app runs, tap the Start button and start saying something; after 12 seconds (you can changeprivate final static int AUDIO_LEN_IN_SECOND = 12;
inMainActivity.java
for a shorter or longer recording length), the model will infer to recognize your speech. Some example recognition results are: