Replies: 3 comments 8 replies
-
@g3grau |
Beta Was this translation helpful? Give feedback.
-
this should be possible but require advanced modifications to source code (file |
Beta Was this translation helpful? Give feedback.
-
Hi @EtienneAb3d, Hi @phineas-pta, good to know. I guess it could be a useful extension, though. I'll first try to get Etiennes code running and see how far I get. Thanks again :) |
Beta Was this translation helpful? Give feedback.
-
I just created subtitles for a bunch of children videos to simplify learning. It's a really wonderful tool and produces very good results (testing with German) which require only a handful of fixes, mostly with spelling of names or fantasy words, but sometimes missing obvious words for some unexpected choices (I guess that the large model would already fix that).
For a grown-up those fixes may not be necessary at all, but for learning purposes I want the text to be correct. Without word timestamps, fixing those errors by editing the vtt file is fairly fast and straight forward.
However, after enabling timestamps the required fixes multiply and I'm wondering if/how this flow could be improved. Is it possible to keep the "most likely" words along with the approximate time stamps, pass it to an editor which allows to select a choice or add a correction before the actual text file with word timestamps is written? Maybe a json file with time data and recognition candidates along with their likelihood could be a good intermediate output for postprocessing. It could also help to build a map to merge alternating spellings to a consistent one, e.g. Schmit vs. Schmidt, Tomas vs. Thomas etc.
P.S. Side note for running whisper locally (since the error messages are not that obvious), hope it helps someone:
nvitop was helpful to figure out that all opened windows on my laptop occupied almost 750MB VRAM.
I can run the medium model on my 6GB GPU when closing browser and many other windows, and after setting
Beta Was this translation helpful? Give feedback.
All reactions