How make a training dataset to fine tune Whisper? #2237

Atefeh197 · 2024-06-21T18:37:59Z

Atefeh197
Jun 21, 2024

I want to fine-tune Whisper on my data, but I do not know the best format for the references.

the speaker says "Oh I am a study um student. My phone number is five twenty.... It is five two zero...".
What is the best transcript for that speech:

I must remove "study"? speakers correct this word
what are the best transcripts for "twenty" and "two zero"? I need digits, can I write "My phone number is 520" for both of them?
What about punctuation and filler words, Can I remove them?

Whisper generates a formal transcript, I do not want to change that when I fine-tune that on my data.

itaipee · 2024-06-24T14:00:44Z

itaipee
Jun 24, 2024

in general the text should represent the speech in the audio , not what meant to be in the audio.
so , definitely include the word "study" in the text.

For punctuation and filler words , it is yours call , but try to be consistent ( either all your transcriptions have punctuations , or all miss punctuations ) .
Whisper output contains punctuations , which makes it much more readable to human.
If most of training data do not include punctuations , most likely you will see much less punctuations in your FT-model transcriptions .

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How make a training dataset to fine tune Whisper? #2237

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How make a training dataset to fine tune Whisper? #2237

Atefeh197 Jun 21, 2024

Replies: 1 comment

itaipee Jun 24, 2024

Atefeh197
Jun 21, 2024

itaipee
Jun 24, 2024