KGBench is a toolbox for knowledge representation learning, which is featured with various automated machine learning methods (e.g. AutoBLM
in TPAMI-2022, KGTuner
in ACL-2022 and the HPO toolbox Ax
). The AutoML techniques enable model and hyperparameter search to improve the performance on the representative KG learning task link prediciton.
This repo is developed upon LibKGE, which is highly configurable, easy to use, and extensible. Compared to the previous code, we have added AutoBLM which adopts bilevel optimization to search bilinear scoring functions, KGTuner which has a two-stage hyperparameter search algorithm. In addition, it can add Relation Prediction as an auxiliary training objective and Node Piece as a special embedder.
KGBench works on both the commonly used KG datasets WN18RR and FB15k-237, as well as the large-scale datasets in OGB, i.e., ogbl-biokg and ogbl-wikikg2. The current best performance achieved by this toolbox is listed below. Better results may be obtained with more searching trials.
Dataset | #Dim | #Parameters | Model Structure | Test MRR | Valid MRR | Configuration | Hardware | Mem |
---|---|---|---|---|---|---|---|---|
ogbl-biokg | 2048 | 192,047,104 | 0.8536 ±0.0003 | 0.8548 ±0.0002 | biokg_best.yaml | Tesla A100 (80G) | 7687MB | |
ogbl-wikikg2 | 256 | 640,154,624 | 0.6404 | 0.6735 | wikikg2_best.yaml | Tesla A100 (80G) | 41307MB |
Dataset | MRR | Hits@1 | Hits@10 | Model Structure | Configuration |
---|---|---|---|---|---|
FB15k-237 | 0.3668 | 0.2764 | 0.5493 | ComplEX | FB15k-237_best.yaml |
WN18RR | 0.4885 | 0.4489 | 0.5592 | ComplEX | WN18RR_best.yaml |
Exampler configurations are provided in the example folder. The following is the instruction for AutoBLM and KGTuner as well as the usage of auxiliary techniques Relation Prediction and Node Piece. See the LibKGE's README for more details of how to use this toolbox and the README in example for how to use AutoBLM, KGTuner, Relation Prediction and Node Piece.
Here, we provide quick start on how to reproduce the results on the datasets in OGB.
# retrieve and install project in development mode
git clone https://github.com/AutoML-Research/KGBench
cd KGBench
pip install -e .
# reproduce our best results on biokg using kgbench start directly
kgbench start example/biokg/biokg_best.yaml
# search blm model structure on biokg
kgbench start example/biokg/biokg_blm_search.yaml
# search hyperparameters
kgbench start example/biokg/biokg_ax_search.yaml
# evaluate on test data after training, using kgbench test + the folder where your training results saved, for example,
kgbench test local/experiments/yyyymmdd-hhmmss-config_file_name
If you start training on biokg or wikikg2 for the first time, it will take a few minutes for their preprocessing. There are more examples in the folder biokg and wikikg2, where we provide the configuration to the search or reproduce the best results. You can use these examples to get into our pipeline quickly.
Since the OGB link prediction datasets have their unique evaluate way, we only provide two models, i.e. AutoBLM and ComplEX, to do evaluation. You can overwrite the two functions, i.e. score_emb_sp_given_negs
and score_emb_po_given_negs
, to adapt other models.
This toolbox was developed by Lin Li (lli18@mails.tsinghua.edu.cn) as undergraduate graduation project. Due to the limit of time and my competence, there may be some mistakes in the toolbox. Please inform us if you find some bugs or have some advice for our code. Your suggestions are welcomed.
Thanks for Professor Quanming Yao (qyaoaa@mail.tsinghua.edu.cn) and Doctor Yongqi Zhang (zhangyongqi@4paradigm.com) for their advice and support during the development of this toolbox. Thanks for LibKGE for their open-source code so that we can conduct our work easily.