No benefit for Deit-S #4

panda1949 · 2021-04-23T08:43:53Z

When I applied re-attention in Deit-S (https://github.com/facebookresearch/deit), no accuracy gain was observed. Could you give some advice?

zhoudaquan · 2021-04-23T13:27:30Z

When I applied re-attention in Deit-S (https://github.com/facebookresearch/deit), no accuracy gain was observed. Could you give some advice?

Hi,

Thanks for trying it out! Based on our observation, the re-attention's benefits are proportional to the number of "similar blocks" as defined in the paper. The number of similar blocks are typically small when the depth of the model is small as shown in Figure in Fig. 1 in the paper. However, you can try with the cosine similarity as regularization as shown in the updated paper. Besides, as the model is shallow, it is not necessary to apply re-attention for all blocks. You could refer to Fig. 9 in the appendix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No benefit for Deit-S #4

No benefit for Deit-S #4

panda1949 commented Apr 23, 2021 •

edited

Loading

zhoudaquan commented Apr 23, 2021

No benefit for Deit-S #4

No benefit for Deit-S #4

Comments

panda1949 commented Apr 23, 2021 • edited Loading

zhoudaquan commented Apr 23, 2021

panda1949 commented Apr 23, 2021 •

edited

Loading