Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed layer #2032

Merged
merged 4 commits into from
Aug 25, 2015
Merged

Embed layer #2032

merged 4 commits into from
Aug 25, 2015

Conversation

jeffdonahue
Copy link
Contributor

(Replaces #1872)

Based on #1977 (parameter gradient accumulation). This adds EmbedLayer (should probably change the name to EmbeddingLayer for consistency with PoolingLayer etc.), which essentially learns a lookup table for integer inputs, useful for language modeling and such. Its computation is equivalent to an InnerProductLayer with "one-hot" vector inputs, but instead of explicitly representing the one-hot vectors (which wastes lots of memory), this assumes the input itself is the indices of the "hot" index of those one-hot vectors (like the label inputs for the categorical losses). This should probably be replaced with SparseInnerProduct (#937) once that's merged, assuming that's faster -- this is a more lightweight change that continues the unfortunate trend of casting floats to ints as labels.

@jeffdonahue
Copy link
Contributor Author

Rebased and ready for review. (Previously depended on gradient accumulation PR #1663.)

@futurely
Copy link

This layer works as a lookup table and could be renamed to LookupTable.
https://github.com/torch/nn/blob/master/doc/convolution.md#nn.LookupTable
https://github.com/torch/nn/blob/master/LookupTable.lua

shelhamer added a commit that referenced this pull request Aug 25, 2015
Embed layer for lookup table of one hot encodings
@shelhamer shelhamer merged commit 80579b8 into BVLC:master Aug 25, 2015
@shelhamer shelhamer mentioned this pull request Aug 25, 2015
@jeffdonahue jeffdonahue deleted the embed-layer branch August 26, 2015 00:55
@beniz
Copy link

beniz commented Dec 8, 2015

I am very confused by this Embed layer. My hunch is that no one uses it outside of the RNN/LSTM branch. So I doubt I'll get any answer but let's try just in case.

I've tried to use it in a simple MLP like this:

Here is an example of a typical prototxt section:

layer {
  name: "embed"
  type: "Embed"
  bottom: "data"
  top: "embed_data"
  embed_param {
    input_dim: 5454
    num_output: 200
    weight_filler {
      type: "uniform"
      min: -0.08
      max: 0.08
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "ip0"
  type: "InnerProduct"
  bottom: "embed_data"
  top: "ip0"
  inner_product_param {
    num_output: 200
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  } 
}

Filling up the data elements with the vocabulary indices of words in a sentence, naturally I get an error from the data_transformer since datum channels are now of various sizes. Then I tried padding the remaining elements to 0, as I understand it is done in https://github.com/BVLC/caffe/pull/1873/files

But in this case, there's no memory advantage of doing this vs one-hot vectors since the input dim is the same. Thus I am confused :)

Needless to say, any help is highly appreciated at this point!

Understood, of course the padding is to fix the input sequence length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants