Confidence scores for each word? #284
-
Does anyone know if it is possible to get a confidence score for each word? Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 20 replies
-
you add something like this to token_logits = torch.stack([logits[k, next_tokens[k]] for k in range(next_tokens .shape[0])], dim=0)
# or use logprobs, the log softmax of the logits
# return it along with tokens and completed you can use the above for the GreedyDecoder, but you will probably need to do bit more for the BeamSearchDecoder |
Beta Was this translation helpful? Give feedback.
-
Would be great to get a report along with the output listing all the sections, perhaps color coding, in red or yellow, all the words or phrases that were generated with the least confidence. |
Beta Was this translation helpful? Give feedback.
-
Color coding each segment with a confidence color would also be a very
useful output.
Such an output option would be very helpful when proofing the
transcriptions.
…On Wed, Oct 19, 2022 at 5:03 PM jian ***@***.***> wrote:
Unfortunately, no. The closest thing to this is the avg_logprob in the
result of every segment, which is the average of log softmax of the logits.
—
Reply to this email directly, view it on GitHub
<#284 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA7PLV4TUJELACKNS2DLR2TWEBVZZANCNFSM6AAAAAARA35IXM>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
David C. Sutherland | (310) 729-6411 (c) | Lindale, Texas
|
Beta Was this translation helpful? Give feedback.
-
You must use nn.softmax when you want value in range 0 until 1, so logits in decoding.py at class GreedyDecoder or at class DecodeTask at def _main_loop must be in to softmax. from torch import nn as f
Probs = f.softmax(logits, dim=-1)
# if you want know how the probability of tokens
Prob_token = Probs[token]
print(Prob_token) The result of softmax is probability of predict of logits, when you want to know how probability of the token, you can find value of probability using value of tokens as a index. If you want get confident of word, you must summing this probability from other tokens. Because a word in other language, not in english, can be composed of more token, so you must looking average from this probability of the tokens. I hope this answer can help your problem |
Beta Was this translation helpful? Give feedback.
-
Hi, there is average score here calculated as: F.log_softmax(logits.float(), dim=-1) I created modification of stable-ts of jianfch with confidence scores for words and sequences: |
Beta Was this translation helpful? Give feedback.
-
Hi, I recently did a PR on this topic and implemented a similar feature to the whisper.cpp repo from @ggerganov all in Python Until the PR gets reviewed, you can pip install my repo and use the result.token_probs list |
Beta Was this translation helpful? Give feedback.
-
When I use the prompt, the confidence scores all turn down to zero. What am I doing wrong? |
Beta Was this translation helpful? Give feedback.
you add something like this to
update()
method of the decoderyou can use the above for the GreedyDecoder, but you will probably need to do bit more for the BeamSearchDecoder