Fine-tuning Whisper with timestamp tokens #620
-
Hi, I've successfully fine-tuned Whisper without timestamp tokens, but I'm hoping to fine-tune it with timestamp tokens inserted in the decoder inputs. Here's an example input I'm attempting to use to the decoder:
decoded:
When I try to input this into the model decoder (
When I remove the timestamp tokens from the decoder input, everything works fine. Can someone point me in the right direction? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 5 replies
-
Can you share your code? I'm really interested in fine tunning timestamps but don't know where to start. |
Beta Was this translation helpful? Give feedback.
-
Hi, the tokenizer should have exactly 1501 tokens starting with |
Beta Was this translation helpful? Give feedback.
-
Hi, I'm interested in fine-tuning with timestamps too. when I try to encode "<|0.0|>", tokenizer gave me "[27, 91, 15, 13, 15, 91, 29]" instead of single special token. Thanks, |
Beta Was this translation helpful? Give feedback.
-
Sorry for bothering you but I have one more questions. Should we insert end time token like the image on https://github.com/openai/whisper? can i put timestamp tokens like below? or should i do like below? Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Hi, the tokenizer should have exactly 1501 tokens starting with
tokenizer.timestamp_begin
which corresponds to<|0.0|>
andtokenizer.timestamp_begin + 1500
corresponding to<|30.0|>
, with an interval of 0.02 seconds (please note these are just a notation for convenience used bydecode_with_timestamps()
and not included in the tokenizer as special tokens`). It appears that your timestamp tokens are quite above this range, and fine-tuning should work if you can adjust them to be under the 30.0-second mark!