You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I some basic questions as a student.
I have implemented Transformers multiple times but still learning new things about them. So these are the questions
As seen in the attention maps , only a few values contribute to the final output the maximum.
Isn't this like a long tail distribution where only a few values have very high values and the rest are very low ?
If that is the case then , can we remove some parts on the input query randomly (say 25%) and still achive same results ?
Thank you !
The text was updated successfully, but these errors were encountered:
I some basic questions as a student.
I have implemented Transformers multiple times but still learning new things about them. So these are the questions
As seen in the attention maps , only a few values contribute to the final output the maximum.
Thank you !
The text was updated successfully, but these errors were encountered: