Add TensorFlow Whisper model for audio classification #21777
Labels
Feature request
Request for a new feature
Good Second Issue
Issues that are more difficult to do than "Good First" issues - give it a try if you want!
TensorFlow
Anything TensorFlow
Feature request
The PR #21754 adds the PyTorch version of
WhisperForAudioClassification
. It would be great to add the TensorFlow equivalent.Motivation
Whisper is an encoder-decoder model for speech recognition. However, we can repurpose the model for other speech tasks, such as audio classification.
Audio classification is the task of mapping from an input speech sequence to a single class prediction. For more details, refer to the task page on the Hub: https://huggingface.co/tasks/audio-classification
For audio classification, we only require a single model output. Thus, we do not need the auto-regressive generation capacities of the Whisper decoder (which is used to generate a sequence of text tokens during speech recognition). Instead, we can just use the Whisper encoder to get hidden states, and add a classification head on top to make class label predictions.
This is analogous to using a Wav2Vec2 model for audio classification: the Wav2Vec2 encoder is used to get hidden states, and a classification head added on top to make class label predictions.
The PR #21754 adds the PyTorch version of
WhisperForAudioClassification
. It required adding a projection layer and classification layer on top of theWhisperEncoder
. For more details, refer directly to the pull request.It would be great to add the TensorFlow equivalent of this model for cross-framework support.
The most difficult part of this PR will be getting the model tester to work. You can see from the PyTorch PR that we require a standalone tester for the audio classification model. This is because the original Whisper model is an encoder-decoder model, but the audio classification model is an encoder-only model. Thus, we require different testing logic.
Your contribution
Opening this one up to the community! If you're interested in tackling this, free to drop a comment in this thread and open a PR when you're ready. More than happy to answer any questions / queries about this integration!
The text was updated successfully, but these errors were encountered: