A collection of papers and resources about Data-centric Graph Machine Learning (DC-GML).
We undertake a comprehensive review and provide a promising outlook for data-centric graph machine learning (DC-GML), and propose a systematic framework for DC-GML that encompasses all stages of the graph data lifecycle, including graph data collection, exploration, improvement, exploitation, and maintenance. More details can be found in our review & outlook work: https://arxiv.org/abs/2309.10979
@article{zheng2023towards,
title={Towards Data-centric Graph Machine Learning: Review and Outlook},
author={Zheng, Xin and Liu, Yixin and Bao, Zhifeng and Fang, Meng and Hu, Xia and Liew, Alan Wee-Chung and Pan, Shirui},
journal={arXiv preprint arXiv:2309.10979},
year={2023}
}
- Awesome-Data-Centric-GraphML
The answer to this question corresponds to 'Graph Data Improvement' stage in DC-GML framework, incorporating four aspects of graph data characteristics, i.e., Graph Structure Enhancement, Graph Feature Enhancement, Graph Label Enhancement, and Graph Size Enhancement.
- [KDD'2020-Pro-GNN] Graph structure learning for robust graph neural networks. [paper]
- [ICML'2019-LDS] Learning discrete structures for graph neural networks. [paper]
- [WWW'2021-GEN] Graph structure estimation neural networks. [paper]
- [CVPR'2019-GLCN] Semi-supervised learning with graph learning convolutional networks. [paper]
- [NIPS'2020-IDGL] Iterative deep graph learning for graph neural networks: Better and robust node embeddings. [paper]
- [AIS'2016] Graph sparsification approaches for laplacian smoothing. [paper]
- [SIGMOD'2011] Local graph sparsification for scalable clustering. [paper]
- [SICOMP'2011] Spectral sparsification of graphs.[paper]
- [NIPS'2019] On differentially private graph sparsification and applications. [paper]
- [ICDM'2022-GraphSparsify] A generic graph sparsification framework using deep reinforcement learning. [paper]
- [ICLR'2019-PPNP/APPNP] Predict then propagate: graph neural networks meet personalized pagerank. [paper]
- [NIPS'2019-GDC] Diffusion improves graph learning. [paper]
- [ICLR'2021] Adaptive universal generalized pagerank graph neural network. [paper]
- [NIPS'2021-ADC] Adaptive diffusion in graph neural networks. [paper]
- [NN'2020-GINN] Missing data imputation with adversarially-trained graph convolutional networks. [paper]
- [FGCS'2021-GCN_MF] Graph convolutional networks for graphs containing missing features. [paper]
- [TPAMI'2020-SAT] Learning on attribute-missing graphs. [paper]
- [WWW'2021-HGNN-AC] Heterogeneous graph neural network via attribute completion. [paper]
- [IEEETransCybern'2022-Amer] Amer: A new attribute-missing network embedding approach. [paper]
- [arxiv'2021-SAGA] Siamese attribute-missing graph auto-encoder. [paper]
- [SPM'2013] The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. [paper]
- [GlobalSIP'2014] Signal denoising on graphs via graph filtering. [paper]
- [IET-SP'2018] Graph polynomial filter for signal denoising. [paper]
- [AIS'2015] Trend filtering on graphs. [paper]
- [ICASSP'2020] Graph auto-encoder for graph signal denoising. [paper]
- [TSP'2021] Graph unrolling networks: Interpretable neural networks for graph signal denoising. [paper]
- [TSP'2022] Untrained graph neural networks for denoising. [paper]
- [WWW'2023-MAGNET] Robust graph representation learning for local corruption recovery. [paper]
- [AAAI'2018] Deeper insights into graph convolutional networks for semi-supervised learning. [paper]
- [AAAI'2020] Multi-stage self-supervised learning for graph convolutional networks on graphs with few labeled nodes. [paper]
- [CIKM'2021-IFC-GCN] Rectifying pseudo labels: Iterative feature clustering for graph representation learning. [paper]
- [arXiv'2019-DSGCN] Dynamic self-training framework for graph convolutional networks. [paper]
- [WSDM'2022-RS-GNN] Towards robust graph neural networks for noisy graphs with sparse labels. [paper]
- [DMKD'2023-InfoGNN] Informative pseudo-labeling for graph neural networks with few labels. [paper]
- [WSDM'2023-CLNode] CLNode: Curriculum learning for node classification. [paper]
- [arXiv'2019-D-GNN] Learning graph neural networks with noisy labels. [paper]
- [CIKM'2021-IFC-GCN] Rectifying pseudo labels: Iterative feature clustering for graph representation learning. [paper]
- [KDD'2021-NRGNN] Nrgnn: Learning a label noise resistant graph neural network on sparsely and noisily labeled graphs. [paper]
- [WSDM'2023-RTGNN] Robust training of graph neural networks via noise governance. [paper]
- [WSDM'2021-GraphSMOTE] Graphsmote: Imbalanced node classification on graphs with graph neural networks. [paper]
- [KDD'2021-ImGAGN] Imgagn: Imbalanced network embedding via generative adversarial graph networks. [paper]
- [WWW'2021-PC-GNN] Pick and choose: a GNN-based imbalanced learning approach for fraud detection. [paper]
- [WWW'2021-GraphMixup] Mixup for node and graph classification. [paper]
- [ICLR'2021-GraphENS] GraphENS: Neighbor-aware ego network synthesis for class-imbalanced node classification. [paper]
- [arXiv'2023-GraphSR] GraphSR: A Data Augmentation Algorithm for Imbalanced Node Classification. [paper]
- [NIPS'2021-ReNode] Topology-imbalance learning for semi-supervised node classification. [paper]
- [arXiv'2022-TopoImb] TopoImb: Toward topology-level imbalance in learning from graphs. [paper]
- [IJCAI'2013-igBoost] Graph classification with imbalanced class distributions and noise. [paper]
- [CIKM'2022-G2GNN] Imbalanced graph classification via graph-of-graph neural networks. [paper]
- [ICML'2009-Herding] Herding dynamical weights to learn. [paper]
- [CVPR'2017-ICARL] ICARL: Incremental classifier and representation learning. [paper]
- [ICLR'2018-K-center] Active learning for convolutional neural networks: A core-set approach. [paper]
- [ICAIS'2020-Coarsening] Graph coarsening with preserved spectral properties. [paper]
- [arXiv'2021] Graph domain adaptation: A generative view. [paper]
- [ICLR'2021-GCond] Graph condensation for graph neural networks. [paper]
- [KDD'2022-DosCond] Condensing graphs via one-step gradient matching. [paper]
- [NeurIPS-Workshop'2022] Faster hyperparameter search on graphs via calibrated dataset condensation. [paper]
- [arXiv'2023-SFGC] Structure-free graph condensation: From large-scale graphs to condensed graph-free data. [paper]
- [ACM SIGKDD Explorations Newsletter'2022-Survey] Data augmentation for deep graph learning: A survey. [paper]
- [arXiv'2202-Survey] Graph data augmentation for graph machine learning: A survey. [paper]
- [ICLR'2020-DropEdge] DropEdge: Towards deep graph convolutional networks on node classification. [paper]
- [NeurIPS'2020-GRAND] Graph random neural networks for semi-supervised learning on graphs. [paper]
- [AAAI'2022-NASA] Regularizing graph neural networks via consistency-diversity graph augmentations. [paper]
- [KDD'2020-NodeAug] NodeAug: Semi-supervised node classification with data augmentation. [paper]
- [AAAI'2021-GAUG] Data augmentation for graph neural networks. [paper]
- [AAAI'2021-GraphMix] Graphmix: Improved training of gnns for semi-supervised learning. [paper]
- [WWW'2021-GraphMixup] Mixup for node and graph classification. [paper]
- [WSDM'2021-GraphSMOTE] Graphsmote: Imbalanced node classification on graphs with graph neural networks. [paper]
- [CVPR'2022-FLAG] Robust optimization as data augmentation for large-scale graphs. [paper]
- [ICML'2022-G-Mixup] G-mixup: Graph data augmentation for graph classification. [paper]
- [ICML'2022-LAGNN] Local augmentation for graph neural networks. [paper]
The answer to this question corresponds to 'Graph Data Exploitation' stage in DC-GML framework, incorporating four strategies to learn from graph data with low-quality and limited-availability, i.e., Graph Self-supervised Learning, Graph Semi-supervised Learning, Graph Active Learning, and Graph Transfer Learning.
- [TKDE'2022-Survey] Graph self-supervised learning: A survey. [paper]
- [arXiv'2016-GAE] Variational graph auto-encoders. [paper]
- [CIKM'2017-MGAE] MGAE: Marginalized graph autoencoder for graph clustering. [paper]
- [IJCAI'2018-ARGA] Adversarially regularized graph autoencoder for graph embedding. [paper]
- [ICLR'2019-DGI] Deep graph infomax. [paper]
- [ICML'2020-MVGRL] Contrastive multi-view representation learning on graphs. [paper]
- [NeurIPS'2020-GraphCL] Graph contrastive learning with augmentations. [paper]
- [arXiv'2020-PairwiseDistance/NodeProperty] Self-supervised learning on graphs: Deep insights and new direction. [paper]
- [NeurIPS'2020-GROVER] Self-supervised graph transformer on large-scale molecular data. [paper]
- [WWW'2020-GMI] Graph representation learning via graphical mutual information maximization. [paper]
- [ICML'2020] When does self-supervision help graph convolutional networks? [paper]
- [WWW'2021-GCA] Graph contrastive learning with adaptive augmentation. [paper]
- [ICML'2021-JOAO] Graph contrastive learning automated. [paper]
- [NeurIPS'2021-AD-GCL] Adversarial graph augmentation to improve graph contrastive learning. [paper]
- [KDD'2022-GraphMAE] GraphMAE: Self-supervised masked graph autoencoders. [paper]
- [Information Sciences'2022-S2GRL] A new self-supervised task on graphs: Geodesic distance prediction. [paper]
- [ICLR'2022-AutoSSL] Automated self-supervised learning for graphs. [paper]
- [ICML'2003] Semi-supervised learning using gaussian fields and harmonic functions. [paper]
- [NeurIPS'2003] Learning with local and global consistency. [paper]
- [ICML'2005] Learning from labeled and unlabeled data on a directed graph. [paper]
- [AAAI'2018] Deeper insights into graph convolutional networks for semi-supervised learning. [paper]
- [KDD'2020-NodeAug] NodeAug: Semi-supervised node classification with data augmentation. [paper]
- [NeurIPS'2020-GRAND] Graph random neural networks for semi-supervised learning on graphs. [paper]
- [AAAI'2020-M3S] Multi-stage self-supervised learning for graph convolutional networks on graphs with few labeled nodes. [paper]
- [WSDM'2021-SimP-GCN] Node similarity preserving graph convolutional networks. [paper]
- [ACM-TIS'2021-GCN-LPA] Combining graph convolutional neural networks and label propagation. [paper]
- [AAAI'2021-CG3] Contrastive and generative graph convolutional networks for graph-based semi-supervised learning. [paper]
- [NeurIPS'2021-GCPN] Contrastive graph poisson networks: Semi-supervised learning with extremely limited labels. [paper]
- [AAAI'2022-Meta-PN] Meta propagation networks for graph few-shot semi-supervised learning. [paper]
- [World Wide Web'2022-CycProp] Cyclic label propagation for graph semi-supervised learning. [paper]
- [arXiv'2017-AGE] Active learning for graph embedding. [paper]
- [IJCAI'2018-ANRMAB] Active discriminative network representation learning. [paper]
- [IJCAI'2019-ActiveHNE] ActiveHNE: active heterogeneous network embedding. [paper]
- [arXiv'2019-FeatProp] Active learning for graph neural networks via node feature propagation. [paper]
- [WWW'2020-ATNE] Active domain transfer on network embedding. [paper]
- [KDD'2020-ASGN] ASGN: An active semi-supervised graph neural network for molecular property prediction. [paper]
- [NeurIPS'2020-GPA] Graph policy network for transferable active learning on graphs. [paper]
- [ACML'2020-MetAL] Metal: Active semi-supervised learning on graphs via meta-learning. [paper]
- [TNNLS'2020-SEAL] Seal: Semisupervised adversarial active learning on attributed graphs. [paper]
- [VLDB Endowment'2021-GRAIN] GRAIN: improving data efficiency of graph neural networks via diversified in fluence maximization. [paper]
- [NeurIPS'2021-RIM] RIM: Reliable influence-based active learning on graphs. [paper]
- [WWW'2021-Attent] Attent: Active attributed network alignment. [paper]
- [ICMD'2021-ALG] ALG: Fast and accurate active learning framework for graph convolutional networks. [paper]
- [WWW'2022-ALLIE] ALLIE: Active learning on large-scale imbalanced graphs. [paper]
- [AAAI'2022-BIGENE] Batch active learning with graph neural networks via multi-agent deep reinforcement learning. [paper]
- [ICLR'2022-IGP] Information Gain Propagation: A new way to graph active learning with soft labels. [paper]
- [KDD'2022-JuryGCN] JuryGCN: quantifying jackknife uncertainty on graph convolutional networks. [paper]
- [IJCAI'2019-DANE] DANE: domain adaptive network embedding. [paper]
- [WWW'2020-UDA-GCN] Unsupervised domain adaptive graph convolutional networks. [paper]
- [AAAI'2020-ACDNE] Adversarial deep network embedding for cross-network node classification. [paper]
- [ICDM'2020-OpenWGL] Openwgl: Open-world graph learning. [paper]
- [ICML'2020-PGL] Progressive graph learning for open-set domain adaptation. [paper]
- [NeurIPS'2021-SRGNN] Shift-robust gnns: Overcoming the limitations of localized graph training data. [paper]
- [arXiv'2021-SOGA] Source free unsupervised graph domain adaptation. [paper]
- [arXiv'2021] Graph domain adaptation: A generative view. [paper]
- [NeurIPS-Workshop'2022-SRNC] Shift-robust node classification via graph clustering co-training. [paper]
The answer to this question corresponds to three stages of 'Graph Data Collection, Graph Data Exploration, and Graph Data Maintenance' in DC-GML framework. Along with Graph Data Improvement and Graph Data Exploitation, we build a graph MLOps from the graph data-centric view.
- Amazon Mechanical Turk: https://www.mturk.com/
- [SIGIR-Workshop'2011] Semi-supervised consensus labeling for crowdsourcing. [paper]
- [Cloud Computing'2021] Knowledge graphs meet crowdsourcing: a brief survey. [paper]
- [Journal of Classification'1997] Estimation and prediction for stochastic blockmodels for graphs with latent block structure. [paper]
- Probabilistic graphical models: principles and techniques. [book]
- [NeurIPS'2019] Gnnexplainer: Generating explanations for graph neural networks. [paper]
- [KDD'2014] Focused clustering and outlier detection in large attributed graphs. [paper]
- [Journal of Machine Learning Research'2023] Graph clustering with graph neural networks. [paper]
- [WWW'2021] Pathfinder discovery networks for neural message passing. [paper]
- [arXiv'2020] Benchmarking graph neural networks. [paper]
- [arXiv'2022] Synthetic graph generation to benchmark graph learning. [paper]
- [KDD'2022] Graphworld: Fake graphs bring real insights for gnns. [paper]
- NetworkX: https://networkx.org/
- igraph: https://igraph.org/
- Neo4j: https://neo4j.com/
- [ECML-PKDD'2021] Graphsvx: Shapley value explanations for graph neural networks. [paper]
- [arXiv'2022-TrustworthyGNN-Survey] Trustworthy graph neural networks: Aspects, methods and trends. [paper]
- [IEEE Network'2010] Privacy and security for online social networks: challenges and opportunities. [paper]
- [SIGMOD'2008] Towards identity anonymization on graphs. [paper]
- [Multimedia Tools and Applications'2018] Privacy preservation based on clustering perturbation algorithm for social network. [paper]
- [EDBT/ICDT Workshops'2015] Privacy-Integrated Graph Clustering Through Differential Privacy. [paper]
- [Information Sciences'2020] PGAS: Privacy-preserving graph encryption for accurate constrained shortest distance queries. [paper]
- [KDD'2022] Federatedscope-gnn: Towards a unified, comprehensive and efficient package for federated graph learning. [paper]
- [AAAI'2023] Federated learning on Non-IID graphs via structural knowledge sharing. [paper]
- [IEEE Communications Magazine'1994] Access control: principle and practice. [paper]
- [Global Summit on Computer and Information Technology'2014] Implementation of elliptic curve digital signature algorithm (ECDSA). [paper]
- [Computers and Security'2021] Threat detection and investigation with system-level provenance graphs: a survey. [paper]
- Kubeflow: https://github.com/kubeflow/kubeflow
- Amazon SageMaker: https://aws.amazon.com/sagemaker/
- Amazon Neptune: https://neptune.ai/product
- GraphStorm: https://github.com/awslabs/graphstorm/wiki
- Real-time Fraud Detection with Graph Neural Network on DGL: https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl