Replies: 7 comments
-
I see that the "Hello, welcome to my lecture." initial prompt you're using is from their example. Perhaps it is confusing the model because it's not related to the content? Maybe try a more general initial prompt like, "This is a transcript of a video, and it may cover a variety of topics." I'm totally guessing though. Also I wonder if the extra space at the beginning of your prompt string is affecting it. |
Beta Was this translation helpful? Give feedback.
-
same issue |
Beta Was this translation helpful? Give feedback.
-
This keeps happening to me too. Whole chunks of audio, 10 - 15 seconds from the first window. If I eliminte the initial_prompt, then the whole audio is transcribed properly :( |
Beta Was this translation helpful? Give feedback.
-
If you want any answers you should share an audio sample with the issue and command to reproduce it . |
Beta Was this translation helpful? Give feedback.
-
I also get this issue, when I use initial_prompt, it causes missing some sentences. If I do not use initial_prompt then it transcribes audios properly and does not miss any sentences. |
Beta Was this translation helpful? Give feedback.
-
My command is
whisper 0.flac --model small.en --word_timestamps True --initial_prompt " Hello, welcome to my lecture."
The result is
python3.10/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. def backtrace(trace: np.ndarray): python3.10/site-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead warnings.warn("FP16 is not supported on CPU; using FP32 instead") [00:24.740 --> 00:28.700] The Jerogan Experience. [01:00.180 --> 01:05.420] Maybe he's annoyed with them. Maybe the maybe the kayaks are fucking up there fishing. Look at that. [01:05.420 --> 01:11.940] Bro. That shit just broke your back. California Beach. Oh, yeah, easily could snap your legs in half. [01:12.580 --> 01:17.700] Easily could snap your neck. But they don't eat meat, right? No. So yeah. But I mean just the power alone. [01:17.860 --> 01:22.400] How does it know what it's doing? It's not gentle. I mean, hopefully you can you still got hoping to spit you up. [01:23.220 --> 01:25.260] Hoping. Yeah, hoping. ...
The first sentence begins from 24s. The timestamp is not correct, either.
But if I remove the param 'initial_prompt', it is correct like this. It begins from 0s.
python3.10/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. def backtrace(trace: np.ndarray): python3.10/site-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead warnings.warn("FP16 is not supported on CPU; using FP32 instead") [00:01.600 --> 00:02.320] The Jerogan Experience. [00:02.780 --> 00:04.860] Wasn't it you telling me about the whales that learn how to- [00:04.860 --> 00:07.060] Orcas. Yeah, they've learned how to fuck people's boats up. [00:07.320 --> 00:09.020] That's so funny to me. [00:09.160 --> 00:10.780] It's crazy. It's kind of hilarious. [00:11.380 --> 00:14.020] Because for all these years, we've mistreated them and finally they're like, [00:14.420 --> 00:14.760] Enough. [00:15.440 --> 00:15.440] Yeah. ...
I want to know how the prompt works in this case. And is there any way to avoid missing sentence with this prompt?
Beta Was this translation helpful? Give feedback.
All reactions