Skip to content

Make whisper transcribe numbers in the actual spoken words #1041

Answered by jongwook
Thresher12 asked this question in Q&A
Discussion options

You must be logged in to vote

It's not an explicit conversion but the model predicting the most likely textual output end-to-end. You can try the following which blocks all numeric tokens and encourages the model to transcribe in them literally.

from whisper.tokenizer import get_tokenizer

tokenizer = get_tokenizer(multilingual=False)  # use multilingual=True if using multilingual model
number_tokens = [
    i 
    for i in range(tokenizer.eot)
    if all(c in "0123456789" for c in tokenizer.decode([i]).removeprefix(" "))
]

...

model.transcribe("audio.mp3", suppress_tokens=[-1] + number_tokens, ...)

Replies: 3 comments 14 replies

Comment options

You must be logged in to vote
9 replies
@Warp-MFT
Comment options

@jongwook
Comment options

@orianemartin
Comment options

@ulatekh
Comment options

@kdcyberdude
Comment options

Answer selected by jongwook
Comment options

You must be logged in to vote
2 replies
@lixikun
Comment options

@jongwook
Comment options

Comment options

You must be logged in to vote
3 replies
@asr-lord
Comment options

@Warp-MFT
Comment options

@JabblyApp
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
9 participants