⚡️ Whisper JAX - up to 70x faster than OpenAI Whisper #1277

sanchit-gandhi · 2023-04-24T16:20:00Z

sanchit-gandhi
Apr 24, 2023

Whisper JAX ⚡️ is a highly optimised Whisper implementation for both GPU and TPU. Try the demo here and transcribe a 1 hour of audio in under 15 seconds: https://huggingface.co/spaces/sanchit-gandhi/whisper-jax

The 70x speed gain we see comes in three stages:

Batching over un-batched
JAX over PyTorch
TPUs over GPUs

Let's find out more below 👇

1. Batching over un-batched

🤗 Transformers implements a batching algorithm where a single audio sample is chunked into 30s segments, and then chunks transcribed in batches. This batching algorithm gives up to a 7x gain over OpenAI (which transcibes chunks sequentially) with nearly no degradation to the WER.

2. JAX over PyTorch

JAX is an automatic differentiation library for high-performance machine learning research. By Just-In Time (JIT) compiling Whisper, we get a 2x speed-up vs 🤗 Transformers PyTorch on GPU.

3. TPUs over GPUs

Tensor Processing Units (TPUs) are ML accelerators designed by Google
TPUs are purpose built for matrix multiplications, giving them a signficant advantage over more general GPUs. The result? Running Whisper JAX on TPU v4-8 is 5x faster than on an NVIDIA A100.

Adding it all up

7x from batching
2x from JAX
5x speed-gain from TPU
=> 70x speed-gain overall

Table 1: Average inference time in seconds for audio files of increasing length. GPU device is a single A100 40GB GPU.
TPU device is a single TPU v4-8.

	OpenAI	Transformers	Whisper JAX	Whisper JAX

Framework	PyTorch	PyTorch	JAX	JAX
Backend	GPU	GPU	GPU	TPU

1 min	13.8	4.54	1.72	0.45
10 min	108.3	20.2	9.38	2.01
1 hour	1001.0	126.1	75.3	13.8

Check out the repository for using the model yourself: https://github.com/sanchit-gandhi/whisper-jax

All pre-trained OpenAI checkpoints are compatible! For fine-tuned checkpoints, we include instructions for converting PyTorch weights to Flax: https://github.com/sanchit-gandhi/whisper-jax#available-models-and-languages

ExtReMLapin · 2023-04-24T19:34:57Z

ExtReMLapin
Apr 24, 2023

Do you have any CPU benchmark ?

1 reply

sanchit-gandhi Apr 28, 2023
Author

Not yet! This would be interesting to run - we can still use jax.jit on CPU so should still get a nice speed-up

eddiewng · 2023-04-25T09:30:18Z

eddiewng
Apr 25, 2023

What is the difference between OpenAI and HuggingFace Transformer that so big gap of inference time?

4 replies

guillaumekln Apr 25, 2023

OpenAI transcribes the audio sequentially in the order it is spoken. Both Transformers and Whisper JAX use a batching algorithm, where chunks of audio are batched together and transcribed in parallel

https://github.com/sanchit-gandhi/whisper-jax#benchmarks

eddiewng Apr 25, 2023

Thanks for answering. It help a lot.

HarutyunyanLiana May 4, 2023

Doesn't the sequential transcription provide context passing that is lost in batched version?

Jiltseb May 4, 2023

Yes, the real impact on long-form transcription is not yet explored in the benchmarks as of my knowledge.

dgoryeo · 2023-04-25T12:09:10Z

dgoryeo
Apr 25, 2023

@sanchit-gandhi , this is brilliant work. The Kaggle notebook is portable to google colab too, right?

5 replies

phineas-pta Apr 25, 2023

on colab you have to add:

import jax.tools.colab_tpu
jax.tools.colab_tpu.setup_tpu()

but for now it doesn't work on colab, see sanchit-gandhi/whisper-jax#34

phineas-pta May 18, 2023

on GPU session it works, but the large model crashes (out of memory)

dgoryeo May 18, 2023

Thanks @phineas-pta , do I understand right that it will work on Colab with TPU? I have some credits from GCP left that I was planning to spend on this during the weekend :)

phineas-pta May 18, 2023

i'm not sure, i'm using free colab which use old TPU (incompatible with latest JAX)

if your GCP can have TPU VM like kaggle then it should work

Naomiball Jul 24, 2023

It works with GPU a100 in colab which are not provide for free. You need to purchase compute unit for A100 resources in colab

m7md3dx · 2023-04-26T04:31:34Z

m7md3dx
Apr 26, 2023

Hi. thank you very much . I just want to know when I choose translate . I don't have the language I'm gonna translate to ? . for example
I need to transcribe a English audio file and after transcribing it . I need to translate the transcription to Arabic like this website .
https://subtitlewhisper.com
thank you .

2 replies

phineas-pta Apr 26, 2023

whisper only translate to english, if you want other languages you have to write a program yourself

m7md3dx Apr 26, 2023

thank you .

TylerAtStarboardGames · 2023-04-28T11:34:28Z

TylerAtStarboardGames
Apr 28, 2023

Would this work on mobile devices?

1 reply

sanchit-gandhi Apr 28, 2023
Author

You could try! I've personally not tried running JAX on mobile - but there are probably works out there that do this

Soykertje · 2023-05-07T22:11:14Z

Soykertje
May 7, 2023

I guess the demo is not working anymore, when I try to use it, after I record something and click submit, the area where the results should be shown just shows Error.

0 replies

Naomiball · 2023-07-24T15:01:24Z

Naomiball
Jul 24, 2023

I want to embed this module in my application, is there an image in gcp that can be used directly ?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Whisper JAX - up to 70x faster than OpenAI Whisper #1277

{{title}}

Replies: 7 comments 13 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

⚡️ Whisper JAX - up to 70x faster than OpenAI Whisper #1277

1. Batching over un-batched

2. JAX over PyTorch

3. TPUs over GPUs

Adding it all up

Replies: 7 comments · 13 replies

sanchit-gandhi Apr 28, 2023 Author

sanchit-gandhi Apr 28, 2023 Author

Replies: 7 comments 13 replies

sanchit-gandhi Apr 28, 2023
Author

sanchit-gandhi Apr 28, 2023
Author