Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TensorFlow Whisper model for audio classification #21777

Open
sanchit-gandhi opened this issue Feb 24, 2023 · 5 comments
Open

Add TensorFlow Whisper model for audio classification #21777

sanchit-gandhi opened this issue Feb 24, 2023 · 5 comments
Labels
Feature request Request for a new feature Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! TensorFlow Anything TensorFlow

Comments

@sanchit-gandhi
Copy link
Contributor

Feature request

The PR #21754 adds the PyTorch version of WhisperForAudioClassification. It would be great to add the TensorFlow equivalent.

Motivation

Whisper is an encoder-decoder model for speech recognition. However, we can repurpose the model for other speech tasks, such as audio classification.

Audio classification is the task of mapping from an input speech sequence to a single class prediction. For more details, refer to the task page on the Hub: https://huggingface.co/tasks/audio-classification

For audio classification, we only require a single model output. Thus, we do not need the auto-regressive generation capacities of the Whisper decoder (which is used to generate a sequence of text tokens during speech recognition). Instead, we can just use the Whisper encoder to get hidden states, and add a classification head on top to make class label predictions.

This is analogous to using a Wav2Vec2 model for audio classification: the Wav2Vec2 encoder is used to get hidden states, and a classification head added on top to make class label predictions.

The PR #21754 adds the PyTorch version of WhisperForAudioClassification. It required adding a projection layer and classification layer on top of the WhisperEncoder. For more details, refer directly to the pull request.

It would be great to add the TensorFlow equivalent of this model for cross-framework support.

The most difficult part of this PR will be getting the model tester to work. You can see from the PyTorch PR that we require a standalone tester for the audio classification model. This is because the original Whisper model is an encoder-decoder model, but the audio classification model is an encoder-only model. Thus, we require different testing logic.

Your contribution

Opening this one up to the community! If you're interested in tackling this, free to drop a comment in this thread and open a PR when you're ready. More than happy to answer any questions / queries about this integration!

@sanchit-gandhi sanchit-gandhi added Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! TensorFlow Anything TensorFlow labels Feb 24, 2023
@sanchit-gandhi sanchit-gandhi added the Feature request Request for a new feature label Feb 24, 2023
@OllieBroadhurst
Copy link
Contributor

Hey @sanchit-gandhi, if we're just using the encoder do you think a CTC head could also work, i.e. WhisperForCTC?

@sanchit-gandhi
Copy link
Contributor Author

sanchit-gandhi commented Mar 3, 2023

Hey @OllieBroadhurst! I don't think a an encoder-only Whisper model for speech recognition would be super practical since we'd then need an external language model to correct the phonetic errors made by the CTC model. IMO we're better off using the internal language model provided by the decoder in the original encoder-decoder architecture. The encoder-decoder model is trained end-to-end and on all of the Whisper pre-training data, so likely going to be better than any combination of CTC + LM we train ourselves

@adit299
Copy link
Contributor

adit299 commented Mar 6, 2023

Hello @OllieBroadhurst are you currently working on this? I would love to help out if I can/you need it. Otherwise, I would like to take a look at this issue.

@OllieBroadhurst
Copy link
Contributor

Hi @adit299 ! I'm not so you can take it away!

@adit299
Copy link
Contributor

adit299 commented Mar 7, 2023

Great, will do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! TensorFlow Anything TensorFlow
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants