-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
word timing tweaks #1559
word timing tweaks #1559
Conversation
Hi, can you explain this commit? |
whisper/timing.py
Outdated
@@ -215,6 +215,8 @@ def find_alignment( | |||
|
|||
words, word_tokens = tokenizer.split_to_word_tokens(text_tokens + [tokenizer.eot]) | |||
word_boundaries = np.pad(np.cumsum([len(t) for t in word_tokens[:-1]]), (1, 0)) | |||
if len(word_boundaries) <= 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes crashes because word_boundaries could be empty.
>>> word_tokens = [[5, 1],[3,2,1], [1]]
>>> np.pad(np.cumsum([len(t) for t in word_tokens[:-1]]), (1, 0))
array([0, 2, 5])
>>> word_tokens = []
>>> np.pad(np.cumsum([len(t) for t in word_tokens[:-1]]), (1, 0))
array([0.])
@@ -297,8 +299,6 @@ def add_word_timestamps( | |||
# hack: truncate long words at sentence boundaries. | |||
# a better segmentation algorithm based on VAD should be able to replace this. | |||
if len(word_durations) > 0: | |||
median_duration = np.median(word_durations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added comments to make it clearer. |
@jongwook could you take a look at this? thanks! |
ping |
thanks! |
* word timing tweaks * comment on eot * clearer comments
No description provided.