Replies: 1 comment
-
in general the text should represent the speech in the audio , not what meant to be in the audio. For punctuation and filler words , it is yours call , but try to be consistent ( either all your transcriptions have punctuations , or all miss punctuations ) . |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I want to fine-tune Whisper on my data, but I do not know the best format for the references.
the speaker says "Oh I am a study um student. My phone number is five twenty.... It is five two zero...".
What is the best transcript for that speech:
Whisper generates a formal transcript, I do not want to change that when I fine-tune that on my data.
Beta Was this translation helpful? Give feedback.
All reactions