Possible to keep repeated words #304

section33 · 2022-10-12T14:10:03Z

section33
Oct 12, 2022

Trying to generate a super accurate transcription of exactly what's being said and noticed that if the speaker repeats a word it only shows up as one instance. Is it possible to bypass whatever filtering is happening there?

Answered by jianfch

Oct 12, 2022

The decoding heuristics does not appear to suppress repeated words. So the "filtering" is performed by the model. Typical transcripts done by human omit repeated words. As a result, it has likely learned this from its training data. One way to "bypass this filtering" is to fine-tune the model on data that does not omit repeating words.

View full answer

jianfch · 2022-10-12T15:08:31Z

jianfch
Oct 12, 2022

The decoding heuristics does not appear to suppress repeated words. So the "filtering" is performed by the model. Typical transcripts done by human omit repeated words. As a result, it has likely learned this from its training data. One way to "bypass this filtering" is to fine-tune the model on data that does not omit repeating words.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible to keep repeated words #304

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Possible to keep repeated words #304

section33 Oct 12, 2022

Replies: 1 comment

jianfch Oct 12, 2022

section33
Oct 12, 2022

jianfch
Oct 12, 2022