Skip to content

rockt/SME

 
 

Repository files navigation

SME

The architecture of this package has been designed by Xavier Glorot (https://github.com/glorotxa), with some contributions from Antoine Bordes (https://www.hds.utc.fr/~bordesan).

Update (Nov 13): the code for Translating Embeddings (see https://everest.hds.utc.fr/doku.php?id=en:transe) has been included along with a new version for Freebase (FB15k).

  1. Overview

This package proposes scripts using Theano to perform training and evaluation on several datasets of the models:

  • Structured Embeddings (SE) defined in (Bordes et al., AAAI 2011);
  • Semantic Matching Energy (SME_lin & SME_bil) defined in (Bordes et al., MLJ 2013);
  • NEW Translating Embeddings (TransE) defined in (Bordes et al., NIPS 2013).

Please refer to the following pages for more details and references:

Content of the package:

  • model.py : contains the classes and functions to create the different models and Theano functions (training, evaluation...).
  • {dataset}_exp.py : contains an experiment function to train all the different models on a given dataset.
  • The data/ folder contains the data files for the learning scripts.
  • in the {dataset}/ folders:
    • {dataset}_parse.py : parses and creates data files for the training script of a given dataset.
    • {dataset}_evaluation.py : contains evaluation functions for a given dataset.
    • {dataset}_{model_name}.py : runs the best hyperparameters experiment for a given dataset and a given model.
    • {dataset}_{model_name}.out : output we obtained on our machines for a given dataset and a given model using the script above.
    • {dataset}_test.py : perform quick runs for small models of all types to test the scripts.

The datasets currently available are:

  1. 3rd Party Libraries

You need to install Theano to use those scripts. It also requires: Python >= 2.4, Numpy >=1.5.0, Scipy>=0.8. The experiment scripts are compatible with Jobman but this library is not mandatory.

  1. Installation

Put the script folder in your PYTHONPATH.

  1. Data Files Creation

Put the absolute path of the downloaded dataset (from: https://everest.hds.utc.fr/doku.php?id=en:smemlj12 or https://everest.hds.utc.fr/doku.php?id=en:transe) at the beginning of the {dataset}_parse.py script and run it (the SME folder has to be your current directory). Note: Running Tensor_parse.py generates data for both Kinhsips, UMLS & Nations.

  1. Training and Evaluating a Model

Simply run the corresponding {dataset}_{model_name}.py file (the SME/{dataset}/ folder has to be your current directory) to launch a training. When it's over, running {dataset}_evaluation.py with the path to the best_valid_model.pkl of the learned model runs the evaluation on the test set

  1. Citing

If you use this code, you could provide the link to the github page: https://github.com/glorotxa/SME . Also, depending on the model used, you should cite either the paper on Structured Embeddings (Bordes et al., AAAI 2011), on Semantic Matching Energy (Bordes et al., MLJ 2013) or on Translating Embeddings (Bordes et al., NIPS 2013).

  1. References

  • (Bordes et al., NIPS 2013) Translating Embeddings for Modeling Multi-relational Data (2013). Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston and Oksana Yakhnenko. In Proceedings of Neural Information Processing Systems (NIPS 26), Lake Taho, NV, USA. Dec. 2013.
  • (Bordes et al., MLJ 2013) A Semantic Matching Energy Function for Learning with Multi-relational Data (2013). Antoine Bordes, Xavier Glorot, Jason Weston, and Yoshua Bengio. in Machine Learning. Springer, DOI: 10.1007/s10994-013-5363-6, May 2013
  • (Bordes et al., AAAI 2011) Learning Structured Embeddings of Knowledge Bases (2011). Antoine Bordes, Jason Weston, Ronan Collobert and Yoshua Bengio. in Proceedings of the 25th Conference on Artificial Intelligence (AAAI), AAAI Press.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%