Skip to content

Running the model many times in 30 second music segment (amongst others) gives vastly different outputs #188

Answered by ANonEntity
hugoredinho asked this question in Q&A
Discussion options

You must be logged in to vote

This is expected behavior. From the paper:

We start with temperature 0, i.e. always selecting the tokens with the highest probability,
and increase the temperature by 0.2 up to 1.0 when [...] the average log probability over
the generated tokens is lower than −1

Here's a quick rundown on temperature. I believe it's meant to prevent Whisper from getting stuck in a loop.

Since background noise like music makes Whisper less confident, it's more likely to raise the temperature. If a deterministic result is important to you, you could try forcing the temperature to 0.0.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@hugoredinho
Comment options

Answer selected by hugoredinho
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants