We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aka Fairseq
https://arxiv.org/pdf/1705.03122.pdf
P for position vector
e for embedding
"Positional encoding" from Attention is all you need
(image from https://norman3.github.io/papers/docs/fairseq.html)
For image above, kernel width is 3, and convolution block stack size is 1
using residual connection from g_i
g_i
attention for dot product z and d_i
z
d_i
This is good for stabilize learning
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Convolution Sequence to Sequence Learning
aka Fairseq
https://arxiv.org/pdf/1705.03122.pdf
3. A Convolutional Architecture
3.1. Position Embeddings
P for position vector
e for embedding
See also
"Positional encoding" from Attention is all you need
3.2. Convolutional Block Structure
(image from https://norman3.github.io/papers/docs/fairseq.html)
For image above, kernel width is 3, and convolution block stack size is 1
3.3. Multi-step Attention
using residual connection from
g_i
attention for dot product
z
andd_i
3.4. Normalization Strategy
This is good for stabilize learning
The text was updated successfully, but these errors were encountered: