Should running on a huge NVIDIA GPU increase transcription speed? #1640

jsteinberg-rbi · 2023-08-31T14:27:22Z

jsteinberg-rbi
Aug 31, 2023

I have about a PB of voice data to transcribe. Naturally, based on Whisper's relative model speed documentation, I opted to pay for a huge NVIDIA A100 40GB VRAM instance in GCP to massively speed up the amount it will take me to transcribe all the data. I have had zero issues:

spinning the VM
installing CUDA and everything else because GCP offers a ton of base images that do all that out-of-the-box
installing and using Whisper, which I was already experienced with from transcribing on my Mac
getting my voice data from GCS to the GPU VM, i.e. no issues with networking
no other issues with networking

The thing is: when I go to transcribe it's slow as all hecc! It does transcribe, but the GPU appears to make no difference. Is this how it is supposed to work? It runs at the same speed as when I'm transcribing on my Mac! I'm not doing anything complicated:

whisper foo.wav --model large-v2 --language English

The file I'm passing is about 2GB. I've heard I should make this smaller, but I doubt that's the issue -- this is a 12 vCPU 85GB DRAM 40GB VRAM box.

So yeah I'm lost. Here's some pics of the VM specs below:

Answered by glangford

Aug 31, 2023

Try
whisper foo.wav --model large-v2 --language English --device cuda

View full answer

glangford · 2023-08-31T16:14:46Z

glangford
Aug 31, 2023

Try
whisper foo.wav --model large-v2 --language English --device cuda

5 replies

jsteinberg-rbi Aug 31, 2023
Author

@glangford amazing will do. I have the GPU up and running now so trying immediately.

aramfaghfouri Aug 31, 2023

@jsteinberg-rbi: Thank you for sharing! Could you please share the benchmarks afterward? For example, how many audio minutes were transcribed per minute of computation of A100? Thanks!

glangford Aug 31, 2023

Performance benchmark of different GPUs #918

jsteinberg-rbi Aug 31, 2023
Author

@glangford ugh god man THAT WORKED. thank you. seems trivial after the fact, but of course the whisper github quickstart says nothing of needing to specify --device cuda despite GPU usage being core to the model's use in the first place so I think I'll open an issue on that or something because of course not having ever run a GPU before my first thought was something with the NVIDIA drivers, the linux kernel, etc.

ryanheise Sep 1, 2023

It's also important to note that Whisper won't be able to use all the resources of the A100 (even with --device cuda) unless you implement some sort of batch processing. See for example #1277 and #662 . And here are some early benchmarks for the A100 using the original large-v1 whisper model, and for different batch sizes:

https://www.assemblyai.com/blog/how-to-run-openais-whisper-speech-recognition-model/

ecuHacktp · 2023-11-29T14:34:32Z

ecuHacktp
Nov 29, 2023

@jsteinberg-rbi Have you tried to run several process(eg. various commands of whisper trancribing ) in paralell or at same time ?.. Im looking for a server with this capability but I dont know if it is feasabilty before buying the server. I tested with RTX 2060 but it crashes when run more than 1 process. Thks in advance.

4 replies

glangford Nov 29, 2023

@ecuHacktp fyi in case this is helpful

model.transcribe() modified to perform batch inference on audio files #662

NathanSweet Aug 9, 2024

@ecuHacktp Did you build a server? I have similar needs, to process 7 real time streams concurrently. Any tips?

ecuHacktp Aug 9, 2024

Hi @NathanSweet unfortunately I didn't build the server. I just run only one command (one whisper process), not severals process. I think it is a limitation of whisper because uses all GPU per one process. Can you give us your feedback.. tks.

NathanSweet Aug 10, 2024

@ecuHacktp Interesting. I've just started out on my project, so I've yet to do some tests. It's early but I'm trying to determine what GPU to plan for. Some, like the A100, support MIG which partitions the GPU into multiple smaller, independent GPUs from the system's perspective. I assume each Whisper could use one of those, if MIG is needed.

The max number of MIG instances for most GPUs just happens to be 7, quite convenient for my need to process 7 streams! Based on faster-whisper numbers I am guessing an A100 40GB would be sufficient. Does anyone know if 80GB A100 or an H100 would provide any benefits at all, especially faster inference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should running on a huge NVIDIA GPU increase transcription speed? #1640

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Should running on a huge NVIDIA GPU increase transcription speed? #1640

Replies: 2 comments · 9 replies

jsteinberg-rbi Aug 31, 2023 Author

jsteinberg-rbi Aug 31, 2023 Author

Replies: 2 comments 9 replies

jsteinberg-rbi Aug 31, 2023
Author

jsteinberg-rbi Aug 31, 2023
Author