word timing tweaks #1559

taylorchu · 2023-07-29T23:28:25Z

No description provided.

hoonlight · 2023-07-30T04:16:27Z

Hi, can you explain this commit?

taylorchu · 2023-07-30T04:55:45Z

whisper/timing.py

@@ -215,6 +215,8 @@ def find_alignment(

    words, word_tokens = tokenizer.split_to_word_tokens(text_tokens + [tokenizer.eot])
    word_boundaries = np.pad(np.cumsum([len(t) for t in word_tokens[:-1]]), (1, 0))
+    if len(word_boundaries) <= 1:


This fixes crashes because word_boundaries could be empty.

>>> word_tokens = [[5, 1],[3,2,1], [1]] >>> np.pad(np.cumsum([len(t) for t in word_tokens[:-1]]), (1, 0)) array([0, 2, 5]) >>> word_tokens = [] >>> np.pad(np.cumsum([len(t) for t in word_tokens[:-1]]), (1, 0)) array([0.])

taylorchu · 2023-07-30T05:02:02Z

whisper/timing.py

@@ -297,8 +299,6 @@ def add_word_timestamps(
    # hack: truncate long words at sentence boundaries.
    # a better segmentation algorithm based on VAD should be able to replace this.
    if len(word_durations) > 0:
-        median_duration = np.median(word_durations)


f572f21#r120932778

taylorchu · 2023-07-30T05:10:07Z

Hi, can you explain this commit?

I added comments to make it clearer.

taylorchu · 2023-08-01T22:28:22Z

@jongwook could you take a look at this? thanks!

taylorchu · 2023-08-07T21:23:28Z

ping

taylorchu · 2023-08-07T22:07:19Z

thanks!

* word timing tweaks * comment on eot * clearer comments

word timing tweaks

95826d4

taylorchu commented Jul 30, 2023

View reviewed changes

comment on eot

9fccc7d

clearer comments

b6077ba

jongwook merged commit e8622f9 into openai:main Aug 7, 2023
7 checks passed

abyesilyurt pushed a commit to abyesilyurt/whisper that referenced this pull request Nov 13, 2023

word timing tweaks (openai#1559)

02db96b

* word timing tweaks * comment on eot * clearer comments

trungkienbkhn mentioned this pull request Dec 12, 2023

Word timing tweaks SYSTRAN/faster-whisper#616

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

word timing tweaks #1559

word timing tweaks #1559

taylorchu commented Jul 29, 2023

hoonlight commented Jul 30, 2023

taylorchu Jul 30, 2023

taylorchu Jul 30, 2023

taylorchu commented Jul 30, 2023

taylorchu commented Aug 1, 2023

taylorchu commented Aug 7, 2023

taylorchu commented Aug 7, 2023

word timing tweaks #1559

word timing tweaks #1559

Conversation

taylorchu commented Jul 29, 2023

hoonlight commented Jul 30, 2023

taylorchu Jul 30, 2023

Choose a reason for hiding this comment

taylorchu Jul 30, 2023

Choose a reason for hiding this comment

taylorchu commented Jul 30, 2023

taylorchu commented Aug 1, 2023

taylorchu commented Aug 7, 2023

taylorchu commented Aug 7, 2023