Releases: dmlc/dgl
0.7.1
0.7.1 Release Notes
0.7.1 is a minor release with multiple fixes and a few new models/features/optimizations included as follows.
Note: We noticed that 0.7.1 for Linux is unavailable on our anaconda repository. We are currently working on this issue. For now, please use pip installation instead.
New models
- GCN-based spam review detection (#3145, @kayzliu)
- CARE-GNN (#3187, @kayzliu)
- GeniePath (#3199, @kayzliu)
- EEG-GCNN (#3186, @JOHNW02)
- EvolveGCN (#3190, @maqy1995)
New Features
- Allows providing username in
tools/launch.py
(#3202, @erickim555) - Refactor and allows customized Python binary names in
tools/launch.py
(#3205, @erickim555) - Add support for distributed preprocessing for heterogeneous graphs (#3137, @ankit-garg)
- Correctly pass all DGL client server environment variables for user-defined multi-command (#3245, @erickim555)
- You can configure the DGL configuration directory with environment variable
DGLDEFAULTDIR
(#3277, @konstantino)
Optimizations
- Improve usage of pinned memory in sparse optimizer (#3207, @nv-dlasalle)
- Optimized counting of nonzero entries of DistTensor (#3203, @freeliuzc)
- Remove activation cache if not required (#3258)
- Edge excluding in EdgeDataLoader on GPU (#3226, @nv-dlasalle)
Fixes
- Update numbers for HiLANDER model (#3175)
- New training and test scripts for HiLANDER (#3180)
- Fix potential starving in socket receiver (#3176, @JingchengYu94)
- Fix typo in Tensorflow backend (#3182, @lululxvi)
- Add WeightBasis documentation (#3189)
- Default ntypes/etypes consistency between
dgl.DGLGraph
anddgl.graph
(#3198) - Set sharing strategy for SEAL example (#3167, @KounianhuaDu)
- Remove
DGL_LOADALL
in doc builds (#3150, @lululxvi) - Fix distributed training hang with multiple samplers (#3169)
- Fix
random_walk
documentation inconsistency (#3188) - Fix
curand_init()
calls in rowwise sampling leading to not-so-random results (#3196, @nv-dlasalle) - Fix
force_reload
parameter ofFraudDataset
(#3210, @Orion-wyc) - Fix check for
num_workers
for usingScalarDataBatcher
(#3219, @nv-dlasalle) - Tensoradapter linking issues (#3225, #3246, @nv-dlasalle)
- Diffpool loss did not consider the loss of first diffpooling layer (#3233, @yinpeiqi)
- Fix CUDA 11.1 SPMM crashing with duplicate edges (#3265)
- Fix
DotGatConv
attention bug when computingedge_softmax
(#3272, @Flawless1202) RelGraphConv
reshape argument is incorrect (#3256, @minchenGrab)- Documentation typos and fixes (#3214, #3221, #3244, #3231, #3261, #3264, #3275, #3285, @amorehead, @blokhinnv, @kalinin-sanja)
v0.7.0
This is a new major release with various system optimizations, new features and enhancements, new models and bugfixes.
Important: Change on PyPI Installation
DGL pip wheels are no longer shipped on PyPI. Use the following command to install DGL with pip:
pip install dgl -f https://data.dgl.ai/wheels/repo.html
for CPU.pip install dgl-cuXX -f https://data.dgl.ai/wheels/repo.html
for CUDA.pip install --pre dgl -f https://data.dgl.ai/wheels-test/repo.html
for CPU nightly builds.pip install --pre dgl-cuXX -f https://data.dgl.ai/wheels-test/repo.html
for CUDA nightly builds.
This does not impact conda installation.
GPU-based Neighbor Sampling
DGL now supports uniform neighbor sampling and MFG conversion on GPU, contributed by @nv-dlasalle from NVIDIA. Experiment for GraphSAGE on the ogbn-product graph gets a >10x speedup (reduced from 113s to 11s per epoch) on a g3.16x instance. The following docs have been updated accordingly:
- A new user guide chapter Using GPU for Neighborhood Sampling about when and how to use this new feature.
- The API doc of NodeDataLoader.
New Tutorials for Multi-GPU and Distributed Training
The release brings two new tutorials about multi-GPU training for node classification and graph classification, respectively. There is also a new tutorial about distributed training across multiple machines. All of them are available at https://docs.dgl.ai/.
Improved CPU Message Passing Kernel
The update includes a new CPU implementation of the core GSpMM kernel for GNN message passing, thanks to @sanchit-misra from Intel. The new kernel performs tiling on the sparse CSR matrix and leverages Intel’s LibXSMM for kernel generation, which gives an up to 4.4x speedup over the old kernel. Please read their paper https://arxiv.org/abs/2104.06700 for details.
More efficient NodeEmbedding for multi-GPU training and distributed training
DGL now utilizes NCCL to synchronize the gradients of sparse node embeddings (dgl.nn.NodeEmbedding
) during training (credits to @nv-dlasalle from NVIDIA). The NCCL feature is available in both dgl.optim.SparseAdam
and dgl.optim.SparseAdagrad
. Experiments show a 20% speedup (reduced from 47.2s to 39.5s per epoch) on a g4dn.12xlarge (4 T4 GPU) instance for training RGCN on ogbn-mag graph. The optimization is automatically turned on when NCCL backend support is detected.
The sparse optimizers for dgl.distributed.DistEmbedding
now use a synchronized gradient update strategy. We add a new optimizer dgl.distributed.optim.SparseAdam
. The dgl.distributed.SparseAdagrad
has been moved to dgl.distributed.optim.SparseAdagrad
.
Sparse-sparse Matrix Multiplication and Addition Support
We add two new APIs dgl.adj_product_graph
and dgl.adj_sum_graph
that perform sparse-sparse matrix multiplications and additions as graph operations respectively. They can run with both CPU and GPU with autograd support. An example usage of these functions is Graph Transformer Networks.
PyTorch Lightning Compatibility
DGL is now compatible with PyTorch Lightning for single-GPU training or training with DistributedDataParallel. See this example of training GraphSAGE with PyTorch Lightning.
- Node classification: https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_lightning.py
- Unsupervised learning: https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_lightning_unsupervised.py
We thank @justusschock for making DGL DataLoaders compatible with PyTorch Lightning (#2886).
New Models
A batch of 19 new model examples are added to DGL in 0.7 bringing the total number to be 90+. Users can now use the search bar on https://www.dgl.ai/ to quickly locate the examples with tagged keywords. Below is the list of new models added.
- Interaction Networks for Learning about Objects, Relations, and Physics (https://arxiv.org/abs/1612.00222.pdf) (#2794, @Ericcsr)
- Multi-GPU RGAT for OGB-LSC Node Classification (#2835, @maqy1995)
- Network Embedding with Completely-imbalanced Labels (https://ieeexplore.ieee.org/document/8979355) (#2813, @Fizyhsp)
- Temporal Graph Networks improved (#2860, @Ericcsr)
- Diffusion Convolutional Recurrent Neural Network (https://arxiv.org/abs/1707.01926) (#2858, @Ericcsr)
- Gated Attention Networks for Learning on Large and Spatiotemporal Graphs (https://arxiv.org/abs/1803.07294) (#2858, @Ericcsr)
- DeeperGCN (https://arxiv.org/abs/2006.07739) (#2831, @xnuohz)
- Deep Graph Contrastive Representation Learning (https://arxiv.org/abs/2006.04131) (#2828, #3009, @hengruizhang98)
- Graph Neural Networks Inspired by Classical Iterative Algorithms (https://arxiv.org/abs/2103.06064) (#2770, @ffttyy)
- GraphSAINT (#2792) (@lt610)
- Label Propagation (#2852, @xnuohz)
- Combining Label Propagation and Simple Models Out-performs Graph Neural Networks (https://arxiv.org/abs/2010.13993) (#2852, @xnuohz)
- GCNII (#2874, @kyawlin)
- Latent Dirichlet Allocation on GPU (#2883, @yifeim)
- A Heterogeneous Information Network based Cross Domain Insurance Recommendation System for Cold Start Users (#2864, @KounianhuaDu)
- Five heterogeneous graph models: HetGNN/GTN/HAN/NSHE/MAGNN (#2993, @Theheavens)
- New OGB-arxiv and OGB-proteins results (#3018, @espylapiza)
- Heterogeneous Graph Attention Networks with minibatch sampling (#3005, @maqy1995)
- Learning Hierarchical Graph Neural Networks for Image Clustering (https://arxiv.org/abs/2107.01319) (#3087, #3105)
New Datasets
- Two fake news datasets, Gossipcop and Politifact. (#2876, #2939, @kayzliu)
- Two fraud datasets extracted from Yelp and Amazon. See https://arxiv.org/pdf/2008.08692.pdf and https://ponderly.github.io/pub/PCGNN_WWW2021.pdf for details. (#2876, #2908, @kayzliu)
New Functionalities
- KD-Tree, Brute-force family, and NN-descent implementation of KNN (#2767, #2892, #2941) (@lygztq)
- BLAS-based KNN implementation on GPU (#2868, @milesial)
- A new API
dgl.sample_neighbors_biased
for biased neighbor sampling where each node has a tag, and each tag has its own (unnormalized) probability (#1665, #2987, @soodoshll). We also provide two helper functionssort_csr_by_tag
andsort_csc_by_tag
to sort the internal storage of a graph based on tags to allow such kind of neighbor sampling (#1664, @soodoshll). - Distributed sparse Adam node embedding optimizer (#2733)
- Heterogeneous graph’s
multi_update_all
now supports user-defined cross-type reducers (#2891, @Secbone) - Add
in_degrees
andout_degrees
supports todgl.DistGraph
(#2918) - A new API
dgl.sampling.node2vec_random_walk
for Node2vec random walks (#2992, @Smilexuhc) dgl.node_subgraph
,dgl.edge_subgraph
,dgl.in_subgraph
anddgl.out_subgraph
all have arelabel_nodes
argument to allow graph compaction (i.e. removing the nodes with no edges). (#2929)- Allow direct slicing of a batched graph without constructing a new data structure. (#2349, #2851, #2965)
- Allow setting the distributed node embeddings with
NodeEmbedding.all_set_embedding()
(#3047) - Graphs can be directly created from CSR or CSC representations on either CPU or GPU (#3045). See the API doc of
dgl.graph
for more details. - A new
dgl.reorder
API to permute a graph according to RCMK, METIS or custom strategy (#3063) dgl.nn.GraphConv
now has a left normalization which divides the outgoing messages by out-degrees, equivalent to random-walk normalization (#3114)- Add a new
exclude='self'
to EdgeDataLoader to exclude the edges sampled in the current minibatch alone during neighbor sampling when reverse edges are not available (#3122)
Performance Optimizations
- Check if a COO is sorted to avoid sync during forward/backward and parallelize sorted COO/CSR conversion. (#2645, @nv-dlasalle)
- Faster uniform sampling with replacement (#2953)
- Eliminating ctor & dtor &
IsNullArray
overheads in random walks (#2990, @AjayBrahmakshatriya) - GatedGCNConv shortcut with one edge type (#2994)
- Hierarchical Partitioning in distributed training with 25% speedup (#3000, @soodoshll)
- Save memory usage in
node_split
andedge_split
during partitioning (#3132, @JingchengYu94)
Other Enhancements
- Graph partitioning now returns ID mapping from old nodes/edges to new ones (#2857)
- Better error message when
idx_list
out of bound (#2848) - Kill training jobs on remote machines in distributed training when receiving KeyboardInterrupt (#2881)
- Provide a
dgl.multiprocessing
namespace for multiprocess training with fork and OpenMP (#2905) - GAT supports multidimensional input features (#2912)
- Users can now specify graph format for distributed training (#2948)
- CI now runs on Kubernetes (#2957)
to_heterogeneous(to_homogeneous(hg))
now returns the samehg
. (#2958)remove_nodes
andremove_edges
now preserves batch information. (#3119)
Bug Fixes
- Multiprocessing sampling in distributed training hangs in Python 3.8 (#2315, #2826)
- Use correct NIC for distributed training (#2798, @Tonny-Gu)
- Fix potential TypeError in HGT example (#2830, @zhangtianle)
- Distributed training initialization fails with graphs without node/edge data (#2366, #2838)
- DGL Sparse Optimizer will crash when some DGL NodeEmbedding is not involve...
v0.6.1
0.6.1 is a minor release after 0.6.0 that includes some bug fixes, performance optimizations and minor feature updates.
OGB Large-scale Challenge Baselines
This release provides DGL-based baselines for the OGB Large Scale Challenge (https://ogb.stanford.edu/kddcup2021/), specifically the node classification (#2810) and graph classification (#2778) tasks.
For node classification in particular, we additionally provide the preprocessed author and institution features, as well as the homogenized graph for download.
System Support
- Tensoradapter now supports PyTorch 1.8.1.
Model Updates
- Boost then Convolve (#2740, credits to @nd7141)
- Distributed GPU training of RGCN (#2709)
- Variational Graph Auto-Encoders (#2587, #2727, credits to @juliasun623)
- InfoGraph (#2644, credits to @hengruizhang98)
- DimeNet++ (#2706, credits to @xnuohz)
- GNNExplainer (#2717, credits to @KounianhuaDu)
- Contrastive Multi-View Representation Learning on Graphs (#2739, credits to @hengruizhang98)
- Temporal Graph Networks (#2636, credits to @Ericcsr and thanks to @WangXuhongCN for reviewing)
- Tensorflow EdgeConv module (#2741, credits to @kyawlin)
- CompGCN (#2768, credits to @KounianhuaDu)
- JKNet (#2795, credits to @xnuohz)
Feature Updates
- dgl.nn.CFConv now supports unidirectional bipartite graphs, hence heterogeneous graphs (#2674)
- A QM9 Dataset variant with edge features (#2704 and #2801, credits to @hengruizhang98 and @milesial)
- Display error messages instead of error codes for TCP sockets (#2763)
- Add the ability of specifying the starting point for farthest point sampler (#2755, credits to @lygztq)
- Remove the specification of number of workers and servers in distributed training code and move them to launch script (#2775)
Performance Optimizations
- Optimize the order between message passing and feature transformation in GraphSAGE (#2747)
- Remove duplicate validation in dgl.graph creation (#2789)
- Replacing uniform integer sampling from std::unordered_set to linear search (#2710, credits to @pawelpiotrowicz)
- Automatically setting the number of OMP threads for distributed trainers (#2812)
- Prefer parallelized conversion to CSC from COO instead of transposing CSR (#2793)
Bug Fixes
- Prevents symbol collision of CUB with other libraries and removes thrust dependency (#2758, credits to @nv-dlasalle)
- Temporarily disabling CPU FP16 support due to incomplete code (#2783)
- GraphSAGE on graphs with zero edges produces NaNs (#2786, credits to @drsealks)
- Improvements of DiffPool example (#2730, credits to @lygztq)
- RGCN Link Prediction example sometimes runs beyond given number of epochs (#2757, credits to @turoger)
- Add pseudo code for dgl.nn.HeteroGraphConv to demonstrate how it works (#2729)
- The number of negative edges should be the same as positive edges (#2726, credits to @fang2hou)
- Fix dgl.nn.HeteroGraphConv that cannot be pickled (#2761)
- Add a default value for dgl.dataloading.BlockSampler (#2771, credits to @hengruizhang98)
- Update num_labels to num_classes in datasets (#2769, credits to @andyxukq)
- Remove unused and undefined function in SEAL example (#2791, credits to @ghk829)
- Fix HGT example where relation-specific value tensors are overwritten (#2796)
- Cleanup the process pool correctly when the process exits in distributed training (#2781)
- Fix feature type of ENZYMES in TUDataset (#2800)
- Documentation fixes (#2708, #2721, #2750, #2754, #2744, #2784, #2816, #2817, #2819, credits to @Padarn, @maqy1995, @Michael1015198808, @HuangLED, @xiamr, etc.)
v0.6.0post1
This is a binary rebuild of 0.6.0 release that adds support on PyTorch 1.8 + CUDA 11.1. Please install with either of the following:
conda install dgl-cuda11.1 -c dglteam
pip install dgl-cu111
No feature changes are incorporated.
Currently there is an issue in CUB when building with CUDA 11.1 from source where DGL will crash with various CUDA errors or freeze when using with PyTorch 1.8. You will need to define CUB_CPP_DIALECT=2003
in the C++ and NVCC flags as a work around. Consequently, CUDA 11.1 binaries are built with the macro CUB_CPP_DIALECT=2003
while CUDA 11.0- are built without the macro.
v0.6.0
This new release includes several improvements on DGL’s documentation, distributed training, and fixes and patches to the user experience and system efficiency.
Documentation
The tutorials have been re-worked in this release to make them more consistent and updated to the latest code base. All tutorials are available for download and can be run locally in Jupyter Notebook.
- For absolute beginners, start with the brand new Blitz Introduction to DGL in 120 minutes.
- For those who are interested in mini-batch training of GNNs, read the Stochastic Training of GNNs tutorials which starts from the basic concepts to code examples.
Thanks to the community efforts, DGL’s user guide is now available in Chinese (https://docs.dgl.ai/en/latest/guide_cn/index.html). Credits to @huaiwen @mlsoar @brookhuang16211 Zhiyu Chen @hhhiddleston @AspirinCode @rewonderful @sleeplessai @kevin-meng @CrawlScript @rr-Yiran Qingbiao Li
Model Examples
We index all the DGL examples by their notable tags (e.g. problem domains, tasks, graph characteristics, etc.) and by their publication time. As DGL codebase evolves quickly and may break some examples, we chose to maintain them by branches, i.e., examples on the master branch work with latest nightly build; stable examples are snapshot to the release branch like 0.6.x.
The release also brings 13 new examples, adding up to 72 models in total:
- MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing: https://github.com/dmlc/dgl/tree/master/examples/pytorch/mixhop (Credits to @xnouhz)
- Self-Attention Graph Pooling: https://github.com/dmlc/dgl/tree/master/examples/pytorch/sagpool (Credits to @lygztq )
- GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation: https://github.com/dmlc/dgl/tree/master/examples/pytorch/GNN-FiLM (Credits to @KounianhuaDu )
- TensorFlow implementation of Simplifying Graph Convolutional Networks: https://github.com/dmlc/dgl/tree/master/examples/tensorflow/sgc (Credits to @joshcarty)
- Graph Representation Learning via Hard and Channel-Wise Attention Networks: https://github.com/dmlc/dgl/tree/master/examples/pytorch/hardgat (Credits to @Ericcsr )
- Graph Random Neural Network for Semi-Supervised Learning on Graphs: https://github.com/dmlc/dgl/tree/master/examples/pytorch/grand (Credits to @hengruizhang98 )
- Hierarchical Graph Pooling with Structure Learning: https://github.com/dmlc/dgl/tree/master/examples/pytorch/hgp_sl (Credits to @lygztq )
- Towards Deeper Graph Neural Networks: https://github.com/dmlc/dgl/tree/master/examples/pytorch/dagnn (Credits to @lt610)
- PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation/PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space (part segmentation): https://github.com/dmlc/dgl/tree/master/examples/pytorch/pointcloud/pointnet (Credits to @wcyjames )
- Graph Cross Networks with Vertex Infomax Pooling: https://github.com/dmlc/dgl/tree/master/examples/pytorch/gxn (Credits to @lygztq)
- Neural Graph Collaborative Filtering: https://github.com/dmlc/dgl/tree/master/examples/pytorch/NGCF (Credits to @KounianhuaDu )
- Graph Neural Networks with Convolutional ARMA Filters: https://github.com/dmlc/dgl/tree/master/examples/pytorch/arma (Credits to @xnuohz)
- Link Prediction Based on Graph Neural Networks (SEAL): https://github.com/dmlc/dgl/tree/master/examples/pytorch/seal (Credits to @Smilexuhc )
New APIs & Features
- New API:
set_batch_num_nodes
andset_batch_num_edges
for setting batch information manually. They are useful when users want to transform a batched graph into another or construct a new batched graph by themselves (#2430) - New API:
GraphDataLoader
, a data loader wrapper for graph classification tasks. (#2496) - New API: QM9 dataset. (#2521) (Credits to @xnuohz )
- New API: DGL now allows constructing a
DGLBlock
graph from raw data (viadgl.create_block
) or converting aDGLBlock
to normalDGLGraph
(viadgl.block_to_graph
). They are useful when users wish to transform theDGLBlock
graph produced by data loaders such as reversing the graph for message diffusion instead of message aggregation. (#2555) - New API: A new namespace
dgl.nn.functional
for NN related utilities that are functional, much resemblingtorch.nn.functional
.edge_softmax
is moved there. The olddgl.ops.edge_softmax
is deprecated. (#2442) - New Feature: Support mixed precision training. DGL now supports training with half precision and thus is compatible with PyTorch’s automatic mixed precision package. See the user guide chapter for how to use it.
- (Experimental) New APIs for sparse embedding: (#2451)
dgl.nn.NodeEmbedding
: A class for storing node embeddings that is optimized for training on large-scale graphs.dgl.optim.SparseAdagrad
anddgl.optim.SparseAdam
: Optimizers to work withdgl.nn.NodeEmbedding
.
- (Experimental) Distributed heterogeneous support:
- Enable heterogeneous graph interfaces in
DistGraph
such asg.nodes['type'].data['feat']
, as well as sampling on distributed heterogeneous graph viadgl.sample_neighbors
. See this user guide chapter for more details. - Support distributed graph partitioning on a cluster of machines. See this user guide chapter for more details.
- Enable heterogeneous graph interfaces in
Improvements
- API improvement:
GraphConv
,SAGEConv
,GINConv
now support weighted graph. Users can pass in edge weights via an optionaledge_weight
argument. Also add a new NN moduleEdgeWeightNorm
which normalizes edge weights according to Kipf’s GCN paper. (#2557) - API improvement: Add an optional argument device to all dataloaders (e.g., NodeDataLoader , EdgeDataLoader) to indicate the target device of the produced graph minibatches. (#2450)
- API improvement: Allow GATConv and DotGATConv to return attention values (#2397).
- API improvement: Allow multiple heads in DotGATConv. (#2549) (Credits to @Ericcsr)
- API improvement: Add an optional flag reverse_edge to CitationGraphDataset to disable adding reverse edges to the graph. (#2588) (Credits to @juliasun623 )
- A new implementation for nn.RelGraphConv when low_mem=True. A benchmark on V100 GPU shows it gives a 4.8x boost in training speed on AIFB dataset. (#2468)
- Allow DGL to use PyTorch’s native memory allocator whenever possible. This saves a large number of malloc/free by caching the allocated buffers inside PyTorch (#2328, #2454).
- Speedup DGL by removing unnecessary sorting on CSR structure (#2391) (Credits to @nv-dlasalle )
- Add an option to mini-batch training examples (e.g., GraphSAGE, ClusterGAT, GAT, RGCN) that loads all node features to GPU prior to model training. The option speeds up model training significantly but consumes more GPU memory. (#2453)
- AVX support for faster CPU kernels (#2309) (Credits to @pawelpiotrowicz ). Enabled in binary releases.
- Add a USE_AVX flag in CMake options to allow disabling AVX optimization on hardware that do not support it. (#2428, #2438) Enabled in binary releases.
- Change dgl.remove_nodes and dgl.remove_edges to not override the NID and EID feature field by default. (#2465)
- Allow dgl.batch to batch a list of empty graphs. (#2527) (Credits to @noncomputable )
- Speedup HGT example by using DGL built-in functions (2x faster) (#2394). (Credits to @Maybewuss )
- Speedup cuSPARSE SpMM by using another algorithm (4x faster) (#2550). (Credits to @nv-dlasalle )
- Speedup mini-batch generation by removing unnecessary format conversion (#2551). (Credits to @nv-dlasalle )
- Enable in_degrees and out_degrees on DGLGraph with only COO format. (#2565)
- Enable dgl.to_block on CUDA. (#2339) (Credits to @nv-dlasalle )
- Add a compilation option to compile a tailored TVM runtime into DGL. (#2367) Disabled in binary releases. (Credits to @kira-lin )
Bugfixes
- Fix an issue regarding to OpenMP which causes performance regression when launching multiple training tasks on multi-core machines (#2412).
- Fix a bug where message passing is ignored for empty graph (#2387).
- Fix a bug where citation dataset never loads from cached data. Improve the error message. (#2421)
- Fix a bug in distributed training to allow reusing ports after sockets are closed (#2418)
- Fix a bug in PyTorch backend which produces spontaneous warnings. (#2434)
- Fix a bug that shared memory is not properly deleted when the process is killed by signals. (#2419)
- Fix a bug in knowledge graph dataset which causes undefined variable error. (#2475)
- Fix multiple bugs in metapath2vec (#2491, #2607) (Credits to @pein-hub, @zezhishao )
- Fix a bug in send_and_recv and pull that causes node data writing to wrong places. (#2497)
- Fix a bug in GATConv which changes the model parameters. (#2533)
- Fix a bug that erases empty node features after graph mutation. (#2529) (Credits to @noncomputable )
- Fix an undefined variable bug in LegacyTUDataset. (#2543) (Credits to @lygztq)
- Fix the version check for PyTorch. (#2544)
- Fix a bug in Coauthor datasets that causes duplicate edges (#2569)
- Fix a bug in DGLDataset which prevents it from pickling on Windows (#2572)
- Fix a bug in HeteroGraphConv where node features are not properly handled (#2578)
- Fix a bug in message passing kernels where input data can have different data types. (#2598)
- Fix a boundary bug in segment_sum. (#2610)
- Fix a bug in GINDataset and TUDataset where the node features are in float64 instead of float32. (#2592)
- Fix ...
v0.5.3
This is a patch release mainly for supporting CUDA 11.0. Now DGL supports CUDA 11.0 and PyTorch 1.7 on Linux/Windows/Mac.
Other fixes include:
- Performance fix of graph batching: #2363
- Speedup on readout: #2361
- Speedup in CPU SpMM with sum reducer: #2309 (thanks @pawelpiotrowicz )
- Performance optimization that removes redundant copies between CPU and GPU: #2266 #2267 (thanks @nv-dlasalle )
- Fix segment_reduce() ignoring tailing 0 segments (#2228) (thanks @mjwen)
- Fix crash due to unfound attribute (#2262) (thanks @Samiisd )
- Performance optimization in COO-CSR conversion (#2356 ) (thanks @IzabelaMazur )
- Parallelization in heterogeneous graph format conversion (#2148) (thanks @mozga-intel )
- Fix a bug to enable distributed training of RGCN with CPU (#2345) (thanks @mszarma )
- Numerous documentation fixes (kudos to @cafeal , @maqy1995 , @sw32-seo, @157492196 , @chwan-rice , @ZenoTan )
New examples:
- Sparse embedding for GATNE-T for large graphs (#2234 ) (thanks @sangyx )
- LINE (#2195) (thanks @ShawXh )
- SIGN for OGB (#2316 ) (thanks @lingfanyu )
The Chinese user guide has been released for chapter 1 to 4 (#2351). Thanks @zhjwy9343 for coordination and kudos to all the offline contributors!
v0.5.2
This is a patch release including the following bug fixes and new models.
Documentation fixes
- #2172 CoraFull dataset remove redundant reference
- #2167 #2177 Fix multiple docstring typos and inconsistencies in
dgl.dataloading
and minibatch training user guide. (Thanks @ustchhy for reviewing) - #2131 Update Doc for UDFs
Bug fixes
- #2128 cannot request out_edges() for empty node sets on cuda
- #2098 Context Issue for bfs_edges_generator on GPU Graphs
- #2135 Pickling a subgraph stores the feature of the original graph
- #2137 0.5.x taking too much shared memory during multiprocess training
- #2145 dgl.batch() in 0.5.x is slower than 0.4.x
- #2157 edge_softmax function not working on subgraphs
- #2161 #2165 #2173 TUDataset (Thanks @henrykenlay)
- #2175 Messages not ordered by edge IDs in degree bucketing
- #2166 Error when call apply_edges for dec_graph
- #2169 Multiprocessing neighbor sampling sometimes have the memory corrupted
- #2106 Bad file descriptor error when saving dgl graph to HDFS
- #2188 Fix dtype mismatch in EdgeDataLoader on Windows
Bug fixes in examples
- #2143 Fix unsupervised graphsage
- #2182 Use DistDataLoader instead of Pytorch’s DataLoader in Distributed GraphSAGE (Thanks @liucw2012 )
- #2187 Fix partition for 0.5.1
New examples
- #2153 GCN on OGB-Arxiv (Thanks @espylapiza )
v0.5.1
This is a patch release including the following bug fixes and minor features.
Documentation fixes
- #2081 Reorganize user guide and split chapters into multiple pages
- #2085 #2090 Fix links
- #2091 #2114 User guide on distributed training
- #2096 #2097 Other documentation fixes
- #2086
- #2123 Temporarily remove SSE MXNet tutorial
Bug fixes
- #2100 add_edges() crashes if the input tensor is empty
- #2084 Fix distributed GraphSage running with GPU
- #2107 Building with HDFS previously fails
- #2087 Cannot load the PTC dataset via
dgl.data.GINDataset
- #2076 Empty cuda graph raise error in create_formats_
- #2121 Disable hypersparse memory optimization due to incomplete COO graph support on GPU
- #2115 Fallback to CPU for graph traversal functions on GPU graphs
- #2108 Force
num_workers
andnum_samplers
to be the same for distributed training - #2118 (#2127 )
Bug fixes in examples
- #2119 Bug in using Layer Normalization in RGCN
New features
- #2102 5 utility functions that handle raw data features
- #1979 #2117 CUDA 11 support - We will not release binary builds of CUDA 11 for now
Release changes
- Linux now requires GCC 5 to build DGL from source.
- DGL now supports Mac 10.9+.
0.5.0
This is a major release including new documentation, distributed GNN training support, more new features and models, as well as bug fixes and more system performance improvement. Note that this is a huge update and may break some of the existing codes; see the Migration Guide for details.
New Documentations
- A new user guide that explains the core concepts of DGL including graphs, features, message passing, DGL datasets, full-graph training, stochastic training, and distributed training.
- Re-worked the API reference manual.
Distributed Training
DGL now supports training GNNs on large graphs distributed across multiple machines. The new components are under the dgl.distributed package. The user guide chapter and the API document page describe the usage. New end-to-end examples for distributed training:
- An example for training GraphSAGE using neighbor sampling on ogbn-product and ogbn-paper100M (100M nodes, 1B edges). Included scripts for both supervised and unsupervised training, and offline inference. The training takes 12 seconds per epoch for ogbn-paper100M on a cluster of 4 m5n.24xlarge instances, and achieves 64% accuracy.
- An example for training R-GCN using neighbor sampling on ogbn-mag. Included scripts for both inductive and transductive modeling. The training takes 841 seconds per epoch on a cluster of 4 m5n.24xlarge CPU machines , and achieves 42.32% accuracy.
New Features
Core data structure
- Merged
DGLGraph
andDGLHeteroGraph
.DGLGraph
now supports nodes and edges of different types. - All the APIs on the old
DGLGraph
are now compatible with heterogeneous graphs. They include- Mutation operations such as adding or removing nodes and edges.
- Graph transformation routines such as
dgl.reverse()
dgl.to_bidirected()
- Subgraph extraction routines.
dgl.save_graphs()
anddgl.load_graphs()
- Batching and reading out operators.
- DGL now supports creating graph stored in int32 to further conserve memory. Three new APIs:
DGLGraph.idtype
,DGLGraph.int
,DGLGraph.long
for getting or changing the integer type for storing graph. - DGL now allows performing graph structure relation operations on GPU such as
DGLGraph.in_degrees()
,DGLGraph.edge_ids()
,DGLGraph.subgraph
etc. A new APIDGLGraph.to
to copy a graph to different devices. This leads to a breaking change on requiring the graph and feature tensors to always be on the same device. See the Migration Guide for more explanations. - Many graph transformations and subgraph extraction operations in DGL now automatically copy the corresponding node and edge features from the original graph. The copying happens on-demand, meaning that the copy would not take place until you actually accesses the feature.
- Before 0.5
>>> g = dgl.graph(([0, 1, 2], [3, 4, 5])) >>> g.ndata['x'] = torch.arange(12).view(6, 2) >>> sg = g.subgraph([0, 1]) # sg does not have feature 'x' >>> 'x' in sg.ndata False
- From 0.5
>>> g = dgl.graph(([0, 1, 2], [3, 4, 5])) >>> g.ndata['x'] = torch.arange(12).view(6, 2) >>> sg = g.subgraph([0, 1]) # sg inherits feature 'x' from 'g' >>> 'x' in sg.ndata True >>> print(sg.ndata['x']) # the actual copy happens at here tensor([[0, 1], [1, 2]])
- DGL’s message passing operations (e.g.,
DGLGraph.update_all
,DGLGraph.apply_edges
etc.) now support higher-order gradients when the backend is PyTorch. DGLGraph.subgraph()
andDGLGraph.edge_subgraph()
now accept boolean tensors or dictionary of boolean tensors as input.- Min and max aggregators now return 0 instead of a large number for zero-degree nodes to improve training experience.
- DGL kernels and readout functions are now deterministic.
GNN training utilities
- New classes:
dgl.dataloading.NodeDataLoader
anddgl.dataloading.EdgeDataLoader
for stochastic training of node classification, edge classification, and link prediction with neighborhood sampling on a large graph. Both classes are similar to PyTorchDataLoader
classes to allow easy customization of the neighborhood sampling strategy. - DGL neural networks now support feeding in a single tensor together with a block as input.
- Previously, to perform message passing on a block, you need to always feed in a pair of features as input, representing the features of input and output nodes like the following:
# Assuming that h is a 2D tensor representing the input node features def forward(self, blocks, h): for layer, block in zip(self.layers, blocks): h_dst = h[:block.number_of_dst_nodes()] h = layer(block, (h, h_dst)) return h
- Now, you only need to feed in a single tensor if the input graph is a block.
# Assuming that h is a 2D tensor representing the input node features def forward(self, blocks, h): for layer, block in zip(self.layers, blocks): h = layer(block, h) return h
- Added a check for zero-degree nodes to the following modules to prevent potential accuracy degradation. To prevent the error, either fix it by adding self-loops (using
dgl.add_self_loop
) or passingallow_zero_in_degree=True
to suppress it.- GraphConv, GATConv, EdgeConv, SGConv, GMMConv, AGNNConv, DotGatConv
New APIs
dgl.add_reverse_edges()
adds reverse edges for a heterogeneous graph. It works on all edge types whose source node type is the same as its destination node type.DGLGraph.shared_memory
for copying the graph to shared memory.
New Models
- Hao Xiong @ShawXh has made several DeepWalk submissions to OGB link prediction leaderboard: https://ogb.stanford.edu/docs/leader_linkprop/. The models are now included in the example directory.
- Zhengdao Chen @zhengdao-chen has proposed a node classification model which utilizes edge weights in this tech report https://cims.nyu.edu/~chenzh/files/GCN_with_edge_weights.pdf. The model is included in the example directory and achieved 0.8436 ± 0.0065 ROC-AUC on OGB-proteins: https://ogb.stanford.edu/docs/leader_nodeprop/#ogbn-proteins
- Saurav Manchanda @gurdaspuriya implemented algorithms for computing graph edit distances for graph matching. Both exact and approximate algorithms are implemented. https://github.com/dmlc/dgl/tree/master/examples/pytorch/graph_matching
- We added implementation of Cluster-GCN with GAT and GraphSAGE as the underlying neural network module https://github.com/dmlc/dgl/tree/0.5.x/examples/pytorch/ogb. They achieved 0.7932 and 0.7830 test accuracy on OGB-products respectively. The Cluster-GAT implementation is submitted to OGB leaderboard.
- We updated both of our RGCN examples (https://github.com/dmlc/dgl/tree/0.5.x/examples/pytorch/rgcn and https://github.com/dmlc/dgl/tree/0.5.x/examples/pytorch/rgcn-hetero (https://github.com/dmlc/dgl/tree/0.5.x/examples/pytorch/rgcnhttps://github.com/dmlc/dgl/tree/0.5.x/examples/pytorch/rgcn-hetero)) to support minibatch training. We tested our RGCN implementation on OGB-MAG which achieved 0.4622 accuracy.
- We updated our GraphSAGE example to include the inductive setting, where the training and test graphs are different.
Requirement Update
- For PyTorch users, DGL now requires
torch >= 1.5.0
- For MXNet users, DGL now requires
mxnet >= 1.6
- For TensorFlow users, DGL now requires
tensorflow >= 2.3
- Deprecate support for Python 3.5. Add support for Python 3.8. DGL now supports Python 3.6-3.8.
- Add support for CUDA 10.2
- For users that build DGL from source
- On Linux: libstdc++.so.6.0.19 or later, or equivalently Ubuntu 14.04 or later, CentOS 7 or later.
- On Windows: Windows 10 or Windows server 2016 or later
- On Mac: 10.9 or later
Compatibility issues
Pickle files created in versions 0.4.3post2 or earlier cannot be loaded by 0.5.0. For now, you need to load the graph structure with 0.4.3post2, and save the graph structure as tensors, and reconstruct them with DGL 0.5.