Exploring the graph latent structures has not garnered much attention in the graph generative research field. Yet, exploiting the latent space is as crucial as working on the data space for discrete data such as graphs. However, previous methods either failed to preserve the permutation symmetry of graphs or lacked an effective approaches to model appropriately within the latent space. To mitigate those issues, we propose a simple, yet effective discrete latent graph diffusion generative model. Our model, namely GLAD, not only overcomes the drawbacks of existing latent approaches, but also alleviates inherent issues present in diffusion methods applied on the graph space. We validate our generative model on the molecular benchmark datasets, on which it demonstrates competitive performance compared with the state-of-the-art baselines.
GLAD is built upon Python 3.10.1 and Pytorch 1.12.1. To install additional packages, run the below command:
pip install -r requirements.txt
And rdkit
for molecule graphs:
conda install -c conda-forge rdkit=2020.09.1.0
We follow the GDSS repo [Link] to set up the dataset benchmarks.
We benchmark GLAD on three generic graph datasets (Ego-small, Community_small, ENZYMES) and two molecular graph datasets (QM9, ZINC250k).
To generate the generic datasets, run the following command:
python data/data_generators.py --dataset ${dataset_name}
To preprocess the molecular graph datasets for training models, run the following command:
python data/preprocess.py --dataset ${dataset_name}
python data/preprocess_for_nspdk.py --dataset ${dataset_name}
For the evaluation of generic graph generation tasks, run the following command to compile the ORCA program (see http://www.biolab.si/supp/orca/orca.html):
cd src/metric/orca
g++ -O2 -std=c++11 -o orca orca.cpp
We provide GLAD's hyperparameters in the config
folder.
The first stage, train the finite scalar quantization autoencoder:
sh run -d ${dataset} -t base -e exp -n ${dataset}_base
where:
dataset
: data type (inconfig/data
)dataset_base
: autoencoder base (inconfig/exp/{dataset}_base
)
Example:
sh run -d qm9 -t base -e exp -n qm9_base
The sencod stage, train the discrete latent graph diffusion bridges:
sh run -d ${dataset} -t bridge -e exp -n ${dataset}_bridge
where:
dataset
: data type (inconfig/data
)dataset_bridge
: diffusion bridge (inconfig/exp/{dataset}_bridge
)
Example:
sh run -d qm9 -t bridge -e exp -n qm9_bridge
We provide code that caculates the mean and std of different metrics on generic graphs (15 sampling runs) and molecule graphs (3 sampling runs).
sh run -d ${dataset} -t sample -e exp -n ${dataset}_bridge
Example:
sh run -d qm9 -t sample -e exp -n qm9_bridge
Download our model weights:
sh download.sh
Please refer to our work if you find our paper with the released code useful in your research. Thank you!
@inproceedings{
nguyen2024glad,
title={{GLAD}: Improving Latent Graph Generative Modeling with Simple Quantization},
author={Van Khoa Nguyen and Yoann Boget and Frantzeska Lavda and Alexandros Kalousis},
booktitle={ICML 2024 Workshop on Structured Probabilistic Inference {\&} Generative Modeling},
year={2024},
url={https://openreview.net/forum?id=aY1gdSolIv}
}