Transformer cost曲线 #700

guoshengCS · 2018-03-09T04:06:30Z

Fluid和Pytorch均使用如下的模型和训练参数（目前dropout_op有bug，这里暂时去掉dropout），并统一初始化方法，Transformer在WMT'16数据集上有如图的cost曲线对照图。

    # number of sequences contained in a mini-batch.
    batch_size = 64
    # the hyper params for Adam optimizer.
    learning_rate = 0.001
    beta1 = 0.9
    beta2 = 0.98
    eps = 1e-9
    # the params for learning rate scheduling
    warmup_steps = 4000

    src_vocab_size=2909
    trg_vocab_size=3149
    # the dimension for word embeddings, which is also the last dimension of
    # the input and output of multi-head attention, position-wise feed-forward
    # networks, encoder and decoder.
    d_model = 512
    # size of the hidden layer in position-wise feed-forward networks.
    d_inner_hid = 1024
    # the dimension that keys are projected to for dot-product attention.
    d_key = 64
    # the dimension that values are projected to for dot-product attention.
    d_value = 64
    # number of head used in multi-head attention.
    n_head = 8
    # number of sub-layers to be stacked in the encoder and decoder.
    n_layer = 6
    # dropout rate used by all dropout layers.
    dropout = 0.

shanyi15 · 2018-08-15T09:55:47Z

您好，此issue在近一个月内暂无更新，我们将于今天内关闭。若在关闭后您仍需跟进提问，可重新开启此问题，我们将在24小时内回复您。因关闭带来的不便我们深表歉意，请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

peterzhang2029 mentioned this issue Mar 13, 2018

Operators profiling in Transformer model. #697

Closed

shanyi15 closed this as completed Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer cost曲线 #700

Transformer cost曲线 #700

guoshengCS commented Mar 9, 2018 •

edited

Loading

shanyi15 commented Aug 15, 2018

Transformer cost曲线 #700

Transformer cost曲线 #700

Comments

guoshengCS commented Mar 9, 2018 • edited Loading

shanyi15 commented Aug 15, 2018

guoshengCS commented Mar 9, 2018 •

edited

Loading