The source code for HKGN: Heterogeneous Graph Neural Network with Hypernetworks for Knowledge Graph Embedding ISWC 22.
├── model
│ ├── __init__.py
│ ├── encoder_decoder.py The overall HKGN model.
│ ├── gcn_encoder.py HGNN encoder.
│ └── hyper_conv_layer.py Multi-relational graph convolution.
└── run.py Root script for running the project.
Unzip the compressed datasets.
mkdir data
unzip data_compressed/FB15k-237.zip -d data/
unzip data_compressed/WN18RR.zip -d data/
Then you will get all the essential data to reproduce the results reported in the paper.
FB15k-237
data
├── FB15k-237
Original data:
│ └── train.txt
│ ├── valid.txt
│ ├── test.txt
Subsets divided by relation categories:
│ ├── 1-1.txt
│ ├── 1-n.txt
│ ├── n-1.txt
│ ├── n-n.txt
Subsets divided by entity degrees:
│ ├── ent100.txt [0, 100)
│ ├── ent200.txt [100, 200)
│ ├── ent300.txt [200, 300)
│ ├── ent400.txt [300, 400)
│ ├── ent500.txt [400, 500)
│ ├── ent1000.txt [500, 1000)
│ ├── entmax.txt [1000, max)
WN18RR
├──WN18RR
Original data:
└── train.txt
├── valid.txt
├── test.txt
Subsets divided by entity degrees:
├── ent10.txt [0, 10)
├── ent25.txt [10, 25)
├── ent50.txt [25, 50)
├── ent100.txt [50, 100)
├── ent500.txt [100, max)
- Python 3.x
- torch-scatter (make sure to be compatible with your own pytorch version, please refer to torch-scatter)
- tqdm
- PyTorch >= 1.5.0
- ordered-set
- numpy
Dependencies can be installed using requirements.txt
.
To reproduce the best performance we have found:
# default dataset: FB15k-237
# FB25k-237 layer 2
python run.py -name test_fb_layer2 -gcn_drop 0.4 -model hyper_gcn -gpu 0 -exp hyper_mr_parallel -gcn_layer 2 -layer2_drop 0.2 -layer1_drop 0.3
# FB15k-237 layer 1
python run.py -name test_fb_layer1 -gcn_drop 0.4 -model hyper_gcn -gpu 0 -exp hyper_mr_parallel
# WN18RR layer 1
python run.py -name test_wn18rr_layer1 -gcn_drop 0.4 -model hyper_gcn -batch 256 -gpu 0 -data WN18RR
The detailed hyperparameters:
Hyperparameters | FB15k-237 | WN18RR |
---|---|---|
100 | 100 | |
2 | 2 | |
100 | 100 | |
Relaional kernel size | 3x3 | 3x3 |
Number of Relaional filters | 32 (layer 1) / 16 (layer 2) | 32 |
HKGN layers | 2 | 1 |
Initial entity embedding size | 100 | 100 |
Final entity embedding size | 200 | 200 |
Message dropout | 0.4 | 0.4 |
Batch size | 1024 | 256 |
Learning rate | 0.001 | 0.001 |
Lable smoothing | 0.1 | 0.1 |
Customize the training strategies (parallel and iterative) for multi-relational knowledge graphs by setting the argument -exp
:
hyper_mr_parallel
: Performing all single-relational graph convolution simultaneously (default). High GPU memory footprints and quick training speed.hyper_mr_iter
: Performing different single-relational graph convolution iteratively. Lower GPU memory requirements and slower training speed.
Print the maximum GPU memory allocated by setting the argument -log_gpu_mem
.
If you got out of memory error, try to clear cached memory by passing the argument: -empty_gpu_cache
.
The learned model will be automatically saved in directory /checkpoints
, pass the argument -restore
to resume your saved model, e.g.:
-restore -name your_saved_model_name
Choose specific data file as the test set by setting -test_data
.
-test_data test(defalut)/1-n/n-1/.../ent100/..
Statistics of different relation categories on FB15k-237:
1-1 | 1-N | N-1 | N-N | |
---|---|---|---|---|
#Relations | 17 | 26 | 86 | 108 |
#Training | 4278 | 12536 | 50635 | 204666 |
#Validation | 167 | 1043 | 3936 | 12389 |
#Test | 192 | 1293 | 4508 | 14473 |
Statistics of different degree scopes on FB15k-237:
Enitity degree scopes | #Entities | #Test |
---|---|---|
[0, 100) | 13839 | 16385 |
[100, 200) | 496 | 2055 |
[200, 300) | 76 | 525 |
[300, 400) | 44 | 493 |
[400, 5000) | 35 | 389 |
[500, 1000) | 35 | 373 |
[1000, max) | 16 | 246 |
Statistics of different degree scopes on WN18RR:
Enitity degree scopes | #Entities | #Test |
---|---|---|
[0, 10) | 38102 | 2595 |
[10, 25) | 2497 | 417 |
[25, 50) | 243 | 47 |
[50, 100) | 65 | 29 |
[100, 500) | 36 | 46 |
Our project is based on COMPGCN