Fix truncated words list when the replacement character is decoded #1089

guillaumekln · 2023-03-14T09:07:22Z

The current version of method split_tokens_on_unicode keeps aggregating tokens while the decoded sequence contains the replacement character �.

However, the model can sometimes produce invalid sequences which would be decoded to the replacement character (for example in the temperature fallback). In this case we should include the replacement character in the list of words.

In this PR I propose to save the replacement character if it appears at the same position in the fully decoded sequence.

jongwook · 2023-03-14T16:32:38Z

Thanks! Great catches

…penai#1089)

Fix truncated words list when the replacement character is decoded

e564a27

jongwook merged commit 5f9ac65 into openai:main Mar 14, 2023

guillaumekln deleted the fix-truncated-word-list branch March 14, 2023 16:35

zackees pushed a commit to zackees/whisper that referenced this pull request May 5, 2023

Fix truncated words list when the replacement character is decoded (o…

188726e

…penai#1089)

ilanit1997 pushed a commit to ilanit1997/whisper that referenced this pull request May 16, 2023

Fix truncated words list when the replacement character is decoded (o…

bf1873f

…penai#1089)

abyesilyurt pushed a commit to abyesilyurt/whisper that referenced this pull request Nov 13, 2023

Fix truncated words list when the replacement character is decoded (o…

99239f4

…penai#1089)

kyakuno mentioned this pull request Dec 28, 2023

Update whisper decoding algorithm axinc-ai/ailia-models#1355

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix truncated words list when the replacement character is decoded #1089

Fix truncated words list when the replacement character is decoded #1089

guillaumekln commented Mar 14, 2023

jongwook commented Mar 14, 2023

Fix truncated words list when the replacement character is decoded #1089

Fix truncated words list when the replacement character is decoded #1089

Conversation

guillaumekln commented Mar 14, 2023

jongwook commented Mar 14, 2023