-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embed layer #2032
Embed layer #2032
Conversation
d6eb72a
to
9271abf
Compare
Rebased and ready for review. (Previously depended on gradient accumulation PR #1663.) |
This layer works as a lookup table and could be renamed to LookupTable. |
287d2c1
to
69b0e8c
Compare
(double impl from NVIDIA dev docs; float impl included in CUDA as "atomicAdd")
69b0e8c
to
ac9e29f
Compare
Embed layer for lookup table of one hot encodings
Here is an example of a typical
Understood, of course the padding is to fix the input sequence length. |
(Replaces #1872)
Based on #1977 (parameter gradient accumulation). This adds EmbedLayer (should probably change the name to EmbeddingLayer for consistency with PoolingLayer etc.), which essentially learns a lookup table for integer inputs, useful for language modeling and such. Its computation is equivalent to an InnerProductLayer with "one-hot" vector inputs, but instead of explicitly representing the one-hot vectors (which wastes lots of memory), this assumes the input itself is the indices of the "hot" index of those one-hot vectors (like the label inputs for the categorical losses). This should probably be replaced with SparseInnerProduct (#937) once that's merged, assuming that's faster -- this is a more lightweight change that continues the unfortunate trend of casting floats to ints as labels.