This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemented in pyTorch. While the algorithm is fully functional, it is not very efficient.
As provided, the implementation assumes that its input data follows a Zipfian distribution, making it particularly suitable for training language models or word embeddings. In case the built-in (Zipfian) sampler is used to obtain the distractor items, indices representing the data classes have to be sorted in the order of descending frequency, i.e. index 0 should correspond to the most frequent word in the input data.
The provided code closely follows the TensorFlow NCE-loss implementation. As such, this project should be seen as an attempt to adopt the TF code for use within pyTorch.
This re-implementation was completed with personal use in mind and is, as such, not actively maintained. You are, however, very welcome to extend or adjust it according to your own needs, should you find it useful. Happy coding :) .