This repository contains the source code and datasets for our paper: "10X Faster Subgraph Matching: Dual Matching Networks with Interleaved Diffusion Attention". which is accepted by IJCNN 2023.
We use conda to manage the environment. Please install the dependencies by running:
conda env create -f environment.yml
We experiment on both synthetic and real-world datasets. Before running the experiments, please follow the below steps to prepare the datasets.
First, we need to make configurations for a dataset and put it in the configs folder. For example, we can create a configuration file for the synth_1
dataset as follows:
{
"number_source": 5000,
"number_subgraph_per_source": 2000,
"avg_source_size": 30,
"std_source_size": 5,
"avg_degree": 3.5,
"std_degree": 0.5,
"number_label_node": 20,
"number_label_edge": 10
}
Then, we can generate the dataset by running:
cd data_synthesis
python generate_dataset.py --config configs/synth_1.json
cd ..
python process_data.py synth_1 synthesis
We have prepared the real-world datasets in the data_real folder. To generate the datasets, please run:
cd data_real
python make_datasets.py --ds [DATASET_NAME]
python generate_data_v1.py --config configs/[DATASET_NAME].json
cd ..
python process_data.py [DATASET_NAME] real
Here is the command to run the experiments:
python train.py [-h] [--lr LR]
[--epoch EPOCH]
[--ngpu NGPU]
[--dataset DATASET]
[--batch_size BATCH_SIZE]
[--num_workers NUM_WORKERS]
[--embedding_dim EMBEDDING_DIM]
[--tatic {static,cont,jump}]
[--nhop NHOP]
[--n_graph_layer N_GRAPH_LAYER]
[--d_graph_layer D_GRAPH_LAYER]
[--n_FC_layer N_FC_LAYER]
[--d_FC_layer D_FC_LAYER]
[--data_path DATA_PATH]
[--save_dir SAVE_DIR]
[--log_dir LOG_DIR]
[--dropout_rate DROPOUT_RATE]
[--al_scale AL_SCALE]
[--ckpt CKPT]
[--train_keys TRAIN_KEYS]
[--test_keys TEST_KEYS]
optional arguments:
-h, --help show this help message and exit
--lr LR learning rate
--epoch EPOCH epoch
--ngpu NGPU number of gpu
--dataset DATASET dataset
--batch_size BATCH_SIZE
batch_size
--num_workers NUM_WORKERS
number of workers
--embedding_dim EMBEDDING_DIM
node embedding dim aka number of distinct node label
--tatic {static,cont,jump}
tactic of defining number of hops
--nhop NHOP number of hops
--n_graph_layer N_GRAPH_LAYER
number of GNN layer
--d_graph_layer D_GRAPH_LAYER
dimension of GNN layer
--n_FC_layer N_FC_LAYER
number of FC layer
--d_FC_layer D_FC_LAYER
dimension of FC layer
--data_path DATA_PATH
path to the data
--save_dir SAVE_DIR save directory of model parameter
--log_dir LOG_DIR logging directory
--dropout_rate DROPOUT_RATE
dropout_rate
--al_scale AL_SCALE attn_loss scale
--ckpt CKPT Load ckpt file
--train_keys TRAIN_KEYS
train keys
--test_keys TEST_KEYS
test keys
Additionally, we have prepared the scripts to run the experiments using real datasets in the scripts folder. To run the experiments, please execute the following command:
bash scripts/[DATASET_NAME].sh
If you find this repository useful in your research, please cite our paper:
@inproceedings{Nguyen2023xDualSM,
author={Nguyen, Thanh Toan and Nguyen, Quang Duc and Ren, Zhao and Jo, Jun and Nguyen, Quoc Viet Hung and Nguyen, Thanh Tam},
booktitle={2023 International Joint Conference on Neural Networks},
title={10X Faster Subgraph Matching: Dual Matching Networks with Interleaved Diffusion Attention},
year={2023},
volume={},
number={},
pages={},
doi={}
}