-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are there potential errors in the implementation of SA? #17
Comments
Hi, |
I think @jialeli1 is right. If you don't transpose the attention matrix before the matrix product, the matrix product makes no sense (pay attention to the meaning of each dimension). And I guess because the author didn't transpose the attention matrix, he needed to do the normalization proposed in the paper. However, if you transpose the attention matrix and do the normalization proposed by the original attention paper, you will find the proposed normalization is not necessary. I have re-implemented the segmentation code using PyTorch and got a quite good result. |
@JunweiZheng93 Could you share your implementation code? Thank you so much |
@JunweiZheng93 Could you share your implementation code? Thank you so much |
Hi.
As here, the attention matrix should be transposed before the matrix product, if I understand it correctly.
Here is my draft of the calculation about the dimension of the matrix product.
The text was updated successfully, but these errors were encountered: