Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are there potential errors in the implementation of SA? #17

Open
jialeli1 opened this issue Apr 16, 2021 · 4 comments
Open

Are there potential errors in the implementation of SA? #17

jialeli1 opened this issue Apr 16, 2021 · 4 comments

Comments

@jialeli1
Copy link

Hi.

As here, the attention matrix should be transposed before the matrix product, if I understand it correctly.

Here is my draft of the calculation about the dimension of the matrix product.
sa_20210416112006

@MenghaoGuo
Copy link
Owner

Hi,
Good question.
I do not think it is wrong and please pay attention to the dimension of normalization which is different from original self-attention.

@JunweiZheng93
Copy link

I think @jialeli1 is right. If you don't transpose the attention matrix before the matrix product, the matrix product makes no sense (pay attention to the meaning of each dimension). And I guess because the author didn't transpose the attention matrix, he needed to do the normalization proposed in the paper. However, if you transpose the attention matrix and do the normalization proposed by the original attention paper, you will find the proposed normalization is not necessary. I have re-implemented the segmentation code using PyTorch and got a quite good result.

@MaiRajborirug
Copy link

@JunweiZheng93 Could you share your implementation code? Thank you so much

@Stronger-Huang
Copy link

@JunweiZheng93 Could you share your implementation code? Thank you so much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants