The layer structure and mask #9

ayushais · 2021-07-08T07:02:35Z

Hi,

Thanks for this contribution. In the implementation of attn_mlp the first linear layer increases the dimension. Is this a standard practice because I did not find any details about this in the paper. Also paper also does not describe use of mask, is this again some standard practice for attention layers?

Thanks!!

The text was updated successfully, but these errors were encountered:

toannguyen1904 · 2022-09-08T09:09:50Z

I think the mask is used in some cases similar to Transformer in NLP, if you need it.
If you don't have any special purposes, just set the mask to all ones.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The layer structure and mask #9

The layer structure and mask #9

ayushais commented Jul 8, 2021

toannguyen1904 commented Sep 8, 2022

The layer structure and mask #9

The layer structure and mask #9

Comments

ayushais commented Jul 8, 2021

toannguyen1904 commented Sep 8, 2022