Replies: 1 comment 1 reply
-
Whisper wasn't trained to do that task. From its training on the If you give Whisper the wrong language, such as giving it an English audio file but then telling whisper that the If you want different behaviour, Whisper would need to be trained for that. It would need a training dataset where your audio file containing the utterance "This is a small TTS test sentenced to show general quality." is labelled as a French audio file, not an English one, and then you would need to have in the dataset the corresponding "French" transcript so that Whisper can train to be able to produce what you want. I'll assume that you probably don't want to retrain Whisper for your task since it is expensive to train models, but that would also mean you'd have to limit yourself to using Whisper within the capabilities it has from its current training. P.S. I'm not actually sure what a French transcript of an English audio file should actually look like. I can imagine what a Japanese transcript of an English audio file might look like, maybe with the English words spelled out in Japanese characters like katakana. But French already contains all the letters of the English alphabet so I'd expect that if you are in France and you asked for a transcription of this "English" audio utterance, you should actually want the English transcription, and you should therefore instruct Whisper accordingly, by giving the |
Beta Was this translation helpful? Give feedback.
-
I have a wav file in which it is spoken: "This is a small TTS test sentenced to show general quality." .When doing
whisper X.wav
the result is OK:
Detected language: English
[00:00.000 --> 00:03.600] This is a small TTS test sentenced to show general quality.
However if I want recognition/transcription (NOT translation) in a different language, i get:
whisper --language fr --task transcribe X.wav
[00:00.000 --> 00:03.600] C'est un test de TTS pour montrer la qualité générale.
But I don't want a French translation of the English trsanscription, I want a French transcription of the English utterance (no matter if this may not result in a senseful sequence of French words)
Same with German:
whisper --language de --task transcribe X.wav
[00:00.000 --> 00:03.500] Das ist ein kleiner TTS-Test, die die Generalqualität zeigt.
How can whisper be forced to transcribe in the language which is set by --language ??
(I'm using openai-whisper-20230314 - maybe this has been corrected in newer versions?)
many thanks for any help!
Beta Was this translation helpful? Give feedback.
All reactions