Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neural Machine Translation in Linear Time #11

Open
flrngel opened this issue Mar 6, 2018 · 0 comments
Open

Neural Machine Translation in Linear Time #11

flrngel opened this issue Mar 6, 2018 · 0 comments

Comments

@flrngel
Copy link
Owner

flrngel commented Mar 6, 2018

https://arxiv.org/abs/1610.10099
aka ByteNet
paper from Deepmind

Notations

  • s: source
  • t: target

Abstract

Features

  • (model feature) stacking decoder on top of encoder
  • (training feature) decoder using dynamically unfolding mechanism
  • (result feature) linear in sequence length, side steps excessive memorization

1. Introduction

  • ByteNet is resolution preserving
    • side steps memorization and allows maximal bandwidth between encoder and decoder

2. Neural Translation Model

image

2.1. Desiderata

(Desiderata is latin word of disideratum, which means model's goal in this paper)

  • run in parallel (reducing computation time)
  • resolution preserving with no constant size
  • path between input and output has to be short

3. ByteNet

3.1. Encoder-Decoder Stacking

  • decoder is on top of encoder because to maximize the representational bandwidth

3.2. Dynamic Unfolding

  • finding length |t| with Linear equation (a=1.2, b=0 in this paper)
    image

3.4. Masked One-dimensional Convolutions

  • use masking to prevent future tokens not to affect current token

3.5. Dilation

  • dilation makes receptive field grow exponentially

4. Model Comparison

image
image

Todo

  • Search about dilated convolution
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant