Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Add dgl.nn.CuGraphSAGEConv model #5137

Merged
merged 14 commits into from
Feb 22, 2023
Merged

[Model] Add dgl.nn.CuGraphSAGEConv model #5137

merged 14 commits into from
Feb 22, 2023

Conversation

tingyu66
Copy link
Contributor

@tingyu66 tingyu66 commented Jan 10, 2023

Description

This PR adds a GraphSAGE model Add dgl.nn.CuGraphSAGEConv that uses the accelerated sparse aggregation primitives in cugraph-ops. It requires pylibcugraphops >= 23.02.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

  • New nn.Module: dgl.nn.CuGraphSAGEConv
  • Test that validates its results against SAGEConv

Notes

Fixes rapidsai/cugraph-ops#177.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 10, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 10, 2023

Commit ID: 5a7c648

Build ID: 1

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@tingyu66 tingyu66 marked this pull request as draft January 10, 2023 15:50
@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 10, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 10, 2023

Commit ID: cd4c4fa

Build ID: 2

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@Rhett-Ying
Copy link
Collaborator

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 12, 2023

Commit ID: 5a8066394ee08b3f110deb06044f55949580cb0a

Build ID: 3

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 19, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 19, 2023

Commit ID: d0214db9c7c1b081abaf3c5c47da6ea640e01302

Build ID: 4

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 19, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 19, 2023

Commit ID: 42ab8dc2ccda5bda17e3319105805e73ff10f29d

Build ID: 5

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 19, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 19, 2023

Commit ID: 95fe352d3465143e92919e1cdec833ed262678ac

Build ID: 6

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@tingyu66 tingyu66 marked this pull request as ready for review January 20, 2023 03:56
@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 20, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 20, 2023

Commit ID: e5e306a402ae8567d9db588a68c31ee1d8ca1216

Build ID: 7

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 26, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 26, 2023

Commit ID: c878c34f3b2999d86c13eae73ea7d4c1c57d25b1

Build ID: 8

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@tingyu66 tingyu66 changed the title [DO NOT MERGE][Model] Add dgl.nn.CuGraphSAGEConv model [Model] Add dgl.nn.CuGraphSAGEConv model Feb 2, 2023
@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 2, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 2, 2023

Commit ID: 3d7f4f29656a2025b66e29eba30a21e7e013f74d

Build ID: 9

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link


def reset_parameters(self):
r"""Reinitialize learnable parameters."""
self.linear.reset_parameters()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously SageConv considers Xavier uniform while nn.Linear.reset_parameters considers Kaiming uniform. I'm not sure about the effects of this difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Kaiming is more suitable here as ReLU is often the choice for the nonlinearity in GNN; Xavier was designed for sigmoid function.

r"""Reinitialize learnable parameters."""
self.linear.reset_parameters()

def forward(self, g, feat, max_in_degree=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another difference, lack of support for edge_weight

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 17, 2023

Commit ID: 4f6fd15

Build ID: 13

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

@@ -0,0 +1,200 @@
import argparse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run this script? If so, what performance number did you obtain?

Copy link
Contributor Author

@tingyu66 tingyu66 Feb 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in terms of pure training time (not including dataloading), SAGEConv takes 2.5s per epoch, while CuGraphSAGEConv takes 2.0s, despite the overhead of coo-to-csc conversion. Test accuracy is also the same.

Edit: add timings for both mode in the example

mode mixed (uva) pure gpu
CuGraphSAGEConv 2.0 s 1.2 s
SAGEConv 2.5 s 1.7 s

def forward(self, blocks, x):
h = x
for l, (layer, block) in enumerate(zip(self.layers, blocks)):
h = layer(block, h, max_in_degree=10)
Copy link
Member

@mufeili mufeili Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit ugly. Perhaps it's better to pass the argument to SAGE.__init__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what needs to be done here? Are you suggesting to unpack to loop like this?

h = F.relu(self.conv1(g[0], x))
h = F.relu(self.conv2(g[1], h))
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the specification of max_in_degree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see and I do agree that it is not an ideal interface. We did not make max_in_degree an attribute of CuGraphSAGEConv since it is a property of the graph (i.e., block), rather than the model. I have removed it from the example as this flag is optional.
In the meantime, we are improving our aggregation primitives to be more flexible to eventually ditch this option.

default="mixed",
choices=["cpu", "mixed", "puregpu"],
help="Training mode. 'cpu' for CPU training, 'mixed' for CPU-GPU mixed training, "
"'puregpu' for pure-GPU training.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is automatically formatted by lintrunner. I removed the cpu mode, as it is not supported by the model

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes pushed.

Copy link
Member

@mufeili mufeili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done a pass

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 17, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 17, 2023

Commit ID: 55fe4fe57f5efb20abb76e3fcc0f5428c13abc5e

Build ID: 14

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 21, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 21, 2023

Commit ID: 50aa20a2f4a63898d0b2f3d065157db8acd9cbeb

Build ID: 15

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@tingyu66
Copy link
Contributor Author

Thank you @mufeili for the review. Here is a list of disparities between CuGraphSAGEConv and SAGEConv:

  • SAGEConv allows different feature dimensions for source and destination nodes
  • They cover different aggregation types
  • CuGraphSAGEConv does not support edge weights

Some preliminary performance numbers using the included example:

mode mixed (uva) pure gpu
CuGraphSAGEConv 2.0 s 1.2 s
SAGEConv 2.5 s 1.7 s

(copied over from the review comment above for better visibility)

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 21, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 21, 2023

Commit ID: 8d2f6c5

Build ID: 16

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 22, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@mufeili
Copy link
Member

mufeili commented Feb 22, 2023

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 22, 2023

Commit ID: f995b11

Build ID: 17

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 22, 2023

Commit ID: f995b11

Build ID: 18

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@mufeili mufeili merged commit bcf9923 into dmlc:master Feb 22, 2023
@tingyu66 tingyu66 deleted the cugraphops-sageconv branch February 22, 2023 15:10
paoxiaode pushed a commit to paoxiaode/dgl that referenced this pull request Mar 24, 2023
* add CuGraphSAGEConv model

* fix lint issues

* update model to reflect changes in make_mfg_csr(), move max_in_degree to forward()

* lintrunner

* allow reset_parameters()

* remove norm option, simplify test

* allow full graph fallback option, add example

* address comments

* address reviews

---------

Co-authored-by: Mufei Li <mufeili1996@gmail.com>
DominikaJedynak pushed a commit to DominikaJedynak/dgl that referenced this pull request Mar 12, 2024
* add CuGraphSAGEConv model

* fix lint issues

* update model to reflect changes in make_mfg_csr(), move max_in_degree to forward()

* lintrunner

* allow reset_parameters()

* remove norm option, simplify test

* allow full graph fallback option, add example

* address comments

* address reviews

---------

Co-authored-by: Mufei Li <mufeili1996@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants