-
I am a bit confused about how exactly the inference works when I choose nucleus sampling, in particular why there is a beam size as parameter. My understanding was that nucleus sampling does not do beam search, but instead it is a version of random sampling(so randomly choose on each decoding step). Does the implementation of nucleus sampling in ParlAI somehow combine this with beam search? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
When using sampling, you can think of the |
Beta Was this translation helpful? Give feedback.
When using sampling, you can think of the
beam_size
parameter as more of abest_of_n
parameter; if you specify e.g.--beam-size 5
with nucleus sampling, ParlAI will sample 5 generations in parallel and output the one with the highest score at the end.