Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No benefit for Deit-S #4

Open
panda1949 opened this issue Apr 23, 2021 · 1 comment
Open

No benefit for Deit-S #4

panda1949 opened this issue Apr 23, 2021 · 1 comment

Comments

@panda1949
Copy link

panda1949 commented Apr 23, 2021

When I applied re-attention in Deit-S (https://github.com/facebookresearch/deit), no accuracy gain was observed. Could you give some advice?

@zhoudaquan
Copy link
Owner

When I applied re-attention in Deit-S (https://github.com/facebookresearch/deit), no accuracy gain was observed. Could you give some advice?

Hi,

Thanks for trying it out! Based on our observation, the re-attention's benefits are proportional to the number of "similar blocks" as defined in the paper. The number of similar blocks are typically small when the depth of the model is small as shown in Figure in Fig. 1 in the paper. However, you can try with the cosine similarity as regularization as shown in the updated paper. Besides, as the model is shallow, it is not necessary to apply re-attention for all blocks. You could refer to Fig. 9 in the appendix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants