-
I am transcribing my lessons and I am getting the message "Sottotitoli creati dalla comunità Amara.org" in my transcript ("subtitles created by the community Amara.org"). |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
This is an example of regurgitation/hallucination that our data preprocessing unfortunately didn't catch. Retraining all 5 models with a new filter would be the most surefire way to fix it (and there's no separate "Italian model"), but before that happens, you could choose to replace any phrase that contains amara.org with a blank string, as it seems to happen more frequently when the segment is silent or near the beginning and the end of the audio. If you have a long sequence without speech (like an intro/outro animation) in the beginning/end of your video, Whisper may behave better if you supply trimmed audio without those parts; alternatively you could try combining Whisper with VAD (#29, comment) or speaker diarization (#264) tools. |
Beta Was this translation helpful? Give feedback.
This is an example of regurgitation/hallucination that our data preprocessing unfortunately didn't catch. Retraining all 5 models with a new filter would be the most surefire way to fix it (and there's no separate "Italian model"), but before that happens, you could choose to replace any phrase that contains amara.org with a blank string, as it seems to happen more frequently when the segment is silent or near the beginning and the end of the audio.
If you have a long sequence without speech (like an intro/outro animation) in the beginning/end of your video, Whisper may behave better if you supply trimmed audio without those parts; alternatively you could try combining Whisper with VAD (#29…