Skip to content

iDC-NEU/NeutronBench

Repository files navigation

NeutronBench is a GNN system evaluation framework built on NeutronStar.

🔧 Install

Dependencies

  • cmake (>=3.14.2).
  • mpich (>=3.3.3) for inter-process communication.
  • libnuma for NUMA-aware memory allocation.
  • cub for GPU-based graph propagation.
  • libtorch version > 1.7 with gpu support for nn computation.

Building

First clone the repository and initialize the submodule:

git clone https://github.com/iDC-NEU/NeutronBench.git
cd NeutronBench
git submodule update --init --recursive

# or just use one command
git clone --recurse-submodules https://github.com/iDC-NEU/NeutronBench.git

To build:

mkdir build && cd build
cmake ..
make -j 10

To run:

# This is an example (you need to prepare a data, refer to the dataset section below).
./run_nts.sh 1 ./cfgs/gcn_sample_demo.cfg 

📁 Datasets

All datasets we used:

Datasets Nodes Edges #F #L #hidden
Reddit 232.96K 114.85M 602 41 128
OGB-Arxiv 169.34K 2.48M 128 40 128
OGB-Products 2.45M 126.17M 100 47 128
OGB-Papers 111.06M 1.6B 128 172 128
Amazon 1.57M 264,34M 200 107 128
LiveJournal 4.85M 90.55M 600 60 128
Lj-large 7.49M 232.1M 600 60 128
Lj-links 5.2M 205.25M 600 60 128
Enwiki-links 13.59M 1.37B 600 60 128

we provide a python script to generate the data files:

# craete a python enviroments
conda create -n neutronbench python=3.9 -y
conda activate neutronbench

# instll python dependencies
pip install -r ./data/requirements.txt

# process the dataset
python ./data/generate_nts_dataset.py --dataset ogbn-arxiv

For graph datasets that lack ground-truth attributes, we randomly generate features and labels, and split the data into training (65%), validation (25%), and testing (10%) sets.

We provide Google Drive link for downloading the Amazon, LiveJournal, Lj-large, Lj-links, and Enwiki-links datasets.

🚀 Experiments

Data partitioning experiments

# partitioning
python ./exp/exp-partition/exp-partition.py

Batch preparation experiments

# batch size
python ./exp/exp-batch-size/exp-batch-size.py

# sample rate
python ./exp/exp-sample-rate/sample-rate.py

Data Transferring experiments

# data partitioning
python ./exp/exp-partition/exp-partition.py

# batch size
python ./exp/exp-batch-size/exp-batch-size.py

# different optimization
python ./exp/exp-diff-optim/exp-diff-optim.py

# hybrid transfer
python ./exp/exp-hybrid-trans/exp-hybrid-trans.py

# pipeline
python ./exp/exp-diff-optim/exp-diff-pipe.py

# gpu cache 
python ./exp/exp-gpu-cache/exp-gpu-cache.py

📜Reference

If you find NeutronBench useful or relevant to your research, please cite our paper as below:

@article{yuan2024comprehensive,
  author       = {Hao Yuan and Yajiong Liu and Yanfeng Zhang and Xin Ai and Qiange Wang and Chaoyi Chen and Yu Gu and Ge Yu},
  title        = {Comprehensive Evaluation of GNN Training Systems: A Data Management Perspective},
  journal      = {Proc. VLDB Endow.},
  volume       = {17},
  number       = {6},
  pages        = {1241--1254},
  year         = {2024},
  url          = {https://www.vldb.org/pvldb/vol17/p1241-yuan.pdf},
}

📬 Contact

For any questions or feedback, feel free to contract Hao Yuan or create an issue in this repository.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published