[GraphBolt][CUDA] Modify multiGPU example to use GPU sampling. #6961

mfbalin · 2024-01-16T22:22:49Z

Description

Modified the example to utilize GPU sampling. We need to fix the replication problem due to the use of torch pin_memory.

Here is a discussion about how we can fix the process graph and feature duplication issue: pytorch/pytorch#32167
The fix is in #6962.

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
I've leverage the tools to beautify the python and c++ code.
The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
All changes have test coverage
Code is well-documented
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Related issue is referred in this PR
If the PR is for a new model/paper, I've updated the example index here.

Changes

dgl-bot · 2024-01-16T22:23:18Z

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch];
For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

dgl-bot · 2024-01-16T23:07:22Z

Commit ID: deb3e41

Build ID: 1

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot · 2024-01-17T05:13:12Z

Commit ID: 1063706

Build ID: 2

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot · 2024-01-17T05:55:31Z

Commit ID: a067a89

Build ID: 3

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

examples/multigpu/graphbolt/node_classification.py

dgl-bot · 2024-01-17T08:44:44Z

Commit ID: 1786338

Build ID: 4

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

mfbalin · 2024-01-17T08:52:50Z

@dgl-bot

dgl-bot · 2024-01-17T09:43:32Z

Commit ID: 1786338

Build ID: 5

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Modify multiGPU example to use GPU sampling.

deb3e41

mfbalin requested review from frozenbugs and TristonC January 16, 2024 22:22

mfbalin added 2 commits January 16, 2024 23:36

pin memory using the inplace method

1063706

inplace operation does not return self.

a067a89

mfbalin mentioned this pull request Jan 17, 2024

[GraphBolt][CUDA] Inplace pin memory for Graph and TorchFeatureStore #6962

Merged

8 tasks

frozenbugs reviewed Jan 17, 2024

View reviewed changes

examples/multigpu/graphbolt/node_classification.py Show resolved Hide resolved

add cpu sampling as an option

1786338

mfbalin requested a review from frozenbugs January 17, 2024 07:59

mfbalin mentioned this pull request Jan 18, 2024

[GraphBolt][CUDA] gb.expand_indptr #6871

Merged

frozenbugs approved these changes Jan 18, 2024

View reviewed changes

mfbalin merged commit 78fa316 into dmlc:master Jan 18, 2024
2 checks passed

mfbalin deleted the gb_cuda_multigpu_example branch January 18, 2024 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GraphBolt][CUDA] Modify multiGPU example to use GPU sampling. #6961

[GraphBolt][CUDA] Modify multiGPU example to use GPU sampling. #6961

mfbalin commented Jan 16, 2024 •

edited

Loading

dgl-bot commented Jan 16, 2024

dgl-bot commented Jan 16, 2024

dgl-bot commented Jan 17, 2024

dgl-bot commented Jan 17, 2024

dgl-bot commented Jan 17, 2024

mfbalin commented Jan 17, 2024

dgl-bot commented Jan 17, 2024

[GraphBolt][CUDA] Modify multiGPU example to use GPU sampling. #6961

[GraphBolt][CUDA] Modify multiGPU example to use GPU sampling. #6961

Conversation

mfbalin commented Jan 16, 2024 • edited Loading

Description

Checklist

Changes

dgl-bot commented Jan 16, 2024

dgl-bot commented Jan 16, 2024

dgl-bot commented Jan 17, 2024

dgl-bot commented Jan 17, 2024

dgl-bot commented Jan 17, 2024

mfbalin commented Jan 17, 2024

dgl-bot commented Jan 17, 2024

mfbalin commented Jan 16, 2024 •

edited

Loading