Confidence scores for each word? #284

evanbrociner · 2022-10-09T22:33:26Z

evanbrociner
Oct 9, 2022

Does anyone know if it is possible to get a confidence score for each word? Thank you!

Oct 10, 2022

you add something like this to update() method of the decoder

token_logits = torch.stack([logits[k, next_tokens[k]] for k in range(next_tokens .shape[0])], dim=0)
# or use logprobs, the log softmax of the logits
# return it along with tokens and completed

you can use the above for the GreedyDecoder, but you will probably need to do bit more for the BeamSearchDecoder

View full answer

jianfch · 2022-10-10T06:30:05Z

jianfch
Oct 10, 2022

you add something like this to update() method of the decoder

token_logits = torch.stack([logits[k, next_tokens[k]] for k in range(next_tokens .shape[0])], dim=0)
# or use logprobs, the log softmax of the logits
# return it along with tokens and completed

you can use the above for the GreedyDecoder, but you will probably need to do bit more for the BeamSearchDecoder

6 replies

jianfch Oct 19, 2022

Unfortunately, no. The closest thing to this is the avg_logprob in the result of every segment, which is the average of log softmax of the logits.

turnkit Dec 30, 2022

@jongwook could you add this to the next release? Ie a command line option that will output an additional version of the text with the color codes for proofing?

Ideally an additional .srt output file option.

Atefeh197 Mar 10, 2023

you add something like this to update() method of the decoder
token_logits = torch.stack([logits[k, next_tokens[k]] for k in range(next_tokens .shape[0])], dim=0)
# or use logprobs, the log softmax of the logits
# return it along with tokens and completed
you can use the above for the GreedyDecoder, but you will probably need to do bit more for the BeamSearchDecoder

Hi
Could you please describe token_logits in the below code?
token_logits = torch.stack([logits[k, next_tokens[k]] for k in range(next_tokens .shape[0])], dim=0)

daxaxelrod Mar 14, 2023

Hrm, when I tried adding this to the update of GreedyDecoder, all i got back was a tensor of length 1 instead of a weight for each token

Just kidding, decoder.update runs in a loop. have to aggregate the probabilities similar to how tokens gets torch.cat'ed

SinanAkkoyun Mar 16, 2023

you add something like this to update() method of the decoder
token_logits = torch.stack([logits[k, next_tokens[k]] for k in range(next_tokens .shape[0])], dim=0)
# or use logprobs, the log softmax of the logits
# return it along with tokens and completed
you can use the above for the GreedyDecoder, but you will probably need to do bit more for the BeamSearchDecoder

Hi! How would that look for Beam Decoder? For the implementation, what do I need to look out for?

turnkit · 2022-10-19T12:16:19Z

turnkit
Oct 19, 2022

Would be great to get a report along with the output listing all the sections, perhaps color coding, in red or yellow, all the words or phrases that were generated with the least confidence.

0 replies

turnkit · 2022-10-20T03:53:42Z

turnkit
Oct 20, 2022

Color coding each segment with a confidence color would also be a very useful output. Such an output option would be very helpful when proofing the transcriptions.

…

On Wed, Oct 19, 2022 at 5:03 PM jian ***@***.***> wrote: Unfortunately, no. The closest thing to this is the avg_logprob in the result of every segment, which is the average of log softmax of the logits. — Reply to this email directly, view it on GitHub <#284 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7PLV4TUJELACKNS2DLR2TWEBVZZANCNFSM6AAAAAARA35IXM> . You are receiving this because you commented.Message ID: ***@***.***>

-- David C. Sutherland | (310) 729-6411 (c) | Lindale, Texas

8 replies

Infinitay Oct 22, 2022

It would be nice to have this added to upstream when it is complete

turnkit Oct 26, 2022

Very nice. Fantastic work.

Glad you appreciated the idea. Here's a couple more that I feel would also be useful...

For additional proof reading ease, and other general use, would you consider adding a paragraph chunker and remove the suffixed [_TT_nnn] bit for the plaintext version?

Here is an explanation of a how paragraph chunker would work, and I expect there are some already available: https://medium.com/@npolovinkin/how-to-chunk-text-into-paragraphs-using-python-8ae66be38ea6

For SRT/VTT output with this option the color encoding ideally would be compatible with Subtitle Edit for Windows (and others that use the same format.) For color on SRT files here is the documentation: https://docs.fileformat.com/video/srt/

The formatting of SRT files is derived from HTML tags. The formatting tags for the SRT file are listed below.
...
Font Color | <font color=“white”>…</font>

adantart Jan 26, 2023

Thanks for the idea! I quickly hacked something using ANSI colors and it looks interesting:

It looks like awesome ... but how can I implement in the code of the repo in python ?
Thank you !

RYucel Jan 27, 2023

Is it possible to save to a csv file as per word confidence number to check these results manually quickly and fix it on the fly?

emanueleielo Mar 31, 2023

@adantart there is a way to obtain this result with the avg_logprob? Actually I'm using an external server cloud to transcriptions

zanjabil2502 · 2022-10-21T04:49:19Z

zanjabil2502
Oct 21, 2022

You must use nn.softmax when you want value in range 0 until 1, so logits in decoding.py at class GreedyDecoder or at class DecodeTask at def _main_loop must be in to softmax.

from torch import nn as f

Probs = f.softmax(logits, dim=-1)

# if you want know how the probability of tokens

Prob_token = Probs[token]

print(Prob_token)

The result of softmax is probability of predict of logits, when you want to know how probability of the token, you can find value of probability using value of tokens as a index.

If you want get confident of word, you must summing this probability from other tokens. Because a word in other language, not in english, can be composed of more token, so you must looking average from this probability of the tokens.

I hope this answer can help your problem