Transformer Attention Probabilities #504

tomsbergmanis · 2018-08-09T14:43:43Z

This change adds attention probabilities for transformer decoder and completes TODO by @fhieber.
Specifically, we compute transformer attention probabilities as the average attention probabilities over all attention heads in all layers.
To do so, we create, MultiHeadAttentionWithProbs, a subclass of MultiHeadAttention which overrides _attend to return attention probabilities.

To evaluate the resulting attention probability matrices we used them as a basis for discrete word alignments. The resulting alignments were then compared against:

alignments obtained from LSTM attention matrices,
alignments by FastAlign.

When conducting a human evaluation, we found that resulting word alignments are on average judged as acceptable as word alignments from LSTM attention matrices and strictly better than alignments by FastAlign.

Pull Request Checklist

Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]'
until you can check this box.
Unit tests pass (pytest)
Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?
System tests pass (pytest test/system)
Passed code style checking (./style-check.sh)
You have considered writing a test
Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@tomsbergmanis

Changes by @tomsbergmanis Co-Authored-By: tomsbergmanis <tomsbergmanis@users.noreply.github.com>

tdomhan · 2018-08-09T15:12:43Z

this introduces a lot of code repetition, as dot_attention_with_probs essentially does the same thing as dot_attention, just additionally returning the probs.

I'm actually changing dot_attention to return probs as part of PR #470. So maybe we can hold off with this change until we merged the custom encoder/decoder.

tomsbergmanis · 2018-08-10T07:13:08Z

@tdomhan thanks for your answer. Unless you are happy for me to change dot_attention and have another look then, it, of course, can wait till you are done with the custom encoder/decoder.
Maye I ask - do you have a rough estimate of when you might be done with PR #470?

Added max version constraint for numpy

fhieber · 2020-10-15T09:18:28Z

Closing this PR as it would need to change the target branch to sockeye_1.

M4t1ss and others added 11 commits July 23, 2018 10:05

Output transformer attention weights

da38050

Changes by @tomsbergmanis Co-Authored-By: tomsbergmanis <tomsbergmanis@users.noreply.github.com>

bug fix in decode sequence

5c9dd39

Code for attentions that could work with batch size > 1

daa4357

added comments

b6ed3e8

batched attention seems to be working

9db0c36

minor changes in variable names

400ecee

add back transposition to match awslabs code

b8971c4

Added comments/explanations

ad7d537

add shape explanations

6f0ccf6

removed extra dimension in attention_probs

b51f1ff

merged averaging over layers and heads in one

a77dccb

tomsbergmanis requested review from davvil, fhieber, mjdenkowski and tdomhan as code owners August 9, 2018 14:43

pmarcis added 3 commits August 28, 2018 10:30

Backwards compatibility with models from 1.18.19

40c29e5

Added max version constraint for numpy

f2e36c8

Merge pull request #1 from tilde-nlp/numpy-max-constraint

80d3772

Added max version constraint for numpy

fhieber added the sockeye_1 label Jun 3, 2020

fhieber closed this Oct 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer Attention Probabilities #504

Transformer Attention Probabilities #504

tomsbergmanis commented Aug 9, 2018

tdomhan commented Aug 9, 2018

tomsbergmanis commented Aug 10, 2018

fhieber commented Oct 15, 2020

Transformer Attention Probabilities #504

Transformer Attention Probabilities #504

Conversation

tomsbergmanis commented Aug 9, 2018

Pull Request Checklist

tdomhan commented Aug 9, 2018

tomsbergmanis commented Aug 10, 2018

fhieber commented Oct 15, 2020