tags | |
---|---|
|
The attention mechanism in Recurrent Neural Networks (RNNs) was first intriduced in a text translation paper written by Bahdanau et al. (2014).
The attention is a component that is put between RNN encoder and RNN decoder.
The encoder produces hidden states
The attention mechanism allows the decoder's hidden states to condition on dynamically chosen (and trained) selection of the encoder's hidden states. Whereas before the attention mechanism was used, all of the encoder's hidden states must be condensed into a fixed sized vector.
With the attention mechanism
- the previous state
$s_{i - 1}$ - previous output
$y_{i - 1}$ - and context vector
$c_i$
The context vector is the new thing. It is computed as weighted average of
hidden states
Where