New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Neural Machine Translation in Linear Time #11

Open

flrngel opened this issue Mar 6, 2018 · 0 comments

Labels

Convolution NLP NMT

Owner

flrngel commented Mar 6, 2018 •

edited

Loading

https://arxiv.org/abs/1610.10099
aka ByteNet
paper from Deepmind

Notations

s: source
t: target

Abstract

Features

(model feature) stacking decoder on top of encoder
(training feature) decoder using dynamically unfolding mechanism
(result feature) linear in sequence length, side steps excessive memorization

1. Introduction

ByteNet is resolution preserving
- side steps memorization and allows maximal bandwidth between encoder and decoder

2. Neural Translation Model

2.1. Desiderata

(Desiderata is latin word of disideratum, which means model's goal in this paper)

run in parallel (reducing computation time)
resolution preserving with no constant size
path between input and output has to be short

3. ByteNet

3.1. Encoder-Decoder Stacking

decoder is on top of encoder because to maximize the representational bandwidth

3.2. Dynamic Unfolding

finding length |t| with Linear equation (a=1.2, b=0 in this paper)

3.4. Masked One-dimensional Convolutions

use masking to prevent future tokens not to affect current token

3.5. Dilation

dilation makes receptive field grow exponentially

4. Model Comparison

Todo

Search about dilated convolution

flrngel added Convolution NLP NMT labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment