[Performance][CUDA] Labor UVA optimization #5885

mfbalin · 2023-06-18T02:33:57Z

Description

Labor Sampler can perform UVA sampling. However, since it has to access the indptr and indices arrays (especially indices) very often, it slows down by a lot compared to NeighborSampler slowdown. This PR focuses on optimizing the accesses to indptr and indices arrays in a way that doesn't bring any regressions to the pure GPU scenario whereas improving training throughput with fanout 10,10,10 by more than 10 percent on ogbn-products training with use_uva=True. With this change, ogbn-products UVA sampling runtime for labor went from 18ms to 13ms on an A100 machine during multi-GPU training. On other multi-GPU machines with V100s, the sampling time is more than 2x faster.

Also fixed a bug for the weighted case. Added some tests to this PR so that we can detect if things are working properly. Below are the profiles of the old and the new code for LABOR-0 UVA. The new version has the extra OneHopExtractorAlignedKernel that copies the indices array in an aligned manner if it is pinned prior to the main sampling kernel.

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
I've leverage the tools to beautify the python and c++ code.
The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
All changes have test coverage
Code is well-documented
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

mfbalin · 2023-06-18T02:35:14Z

@BarclayII could you take a look? On machines with multiple GPUs where PCI-e bandwidth is highly contested, this PR should improve things by quite a lot.

src/array/cpu/labor_pick.h

src/array/cuda/labor_sampling.cu

dgl-bot · 2023-06-29T06:58:15Z

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

@dgl-bot

dgl-bot · 2023-06-29T06:58:27Z

Commit ID: 6689d7062714a895a298df245bb6701ff10a45e4

Build ID: 16

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

Rhett-Ying · 2023-06-29T09:47:00Z

@frozenbugs dgl.sampling.sample_labors() is not covered in existing DGL benchmarks, we need to add it in order to do performance comparison

dgl-bot · 2023-06-29T19:28:16Z

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

@dgl-bot

dgl-bot · 2023-06-29T19:28:27Z

Commit ID: 2222a5918633685a232185a11182926a25817b60

Build ID: 17

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

dgl-bot · 2023-06-29T19:33:24Z

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

@dgl-bot

dgl-bot · 2023-06-29T19:33:36Z

Commit ID: 9178269c838d842e19b45fa4e071d7db22cbfd55

Build ID: 18

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

Rhett-Ying · 2023-06-30T02:05:21Z

@dgl-bot

dgl-bot · 2023-06-30T02:48:12Z

Commit ID: 0725f0d966af5d4289f67dd3c7629dad6db81840

Build ID: 19

Status: ❌ CI test failed in Stage [DGL-Go CPU test].

Report path: link

Full logs path: link

dgl-bot · 2023-07-03T09:29:29Z

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

@dgl-bot

dgl-bot · 2023-07-03T09:29:41Z

Commit ID: 22b5d74

Build ID: 20

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

Rhett-Ying · 2023-07-03T09:31:56Z

@dgl-bot

dgl-bot · 2023-07-03T11:46:58Z

Commit ID: 22b5d74

Build ID: 21

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

frozenbugs · 2023-07-06T06:54:43Z

Let's get this PR merged first, and let's discuss next week to find a path to integrate it in graphbolt. @mfbalin

mfbalin · 2023-07-07T17:26:48Z

I would also like to make a PR including the kappa feature I talked about in my presentation. It modifies only the random number generation logic and adds the kappa parameter to the sampler API. However, it won't be useful if #4341 or an alternative dynamic cache is not merged. With the addition of the kappa PR, the official LABOR sampling implementation living inside DGL will be complete.

Rhett-Ying · 2023-07-10T04:03:08Z

@caojy1998 FYI. We need to add benchmark for sample_labor() into our new benchmark framework.

caojy1998 · 2023-07-10T08:16:28Z

@caojy1998 FYI. We need to add benchmark for sample_labor() into our new benchmark framework.

OK, got it.

mfbalin · 2023-07-13T07:28:15Z

Is the lack of a benchmark script blocking the merge of this PR or are we waiting for another reviewer's approval?

frozenbugs · 2023-07-13T09:01:31Z

no, not blocking, let's run the CI, and I will merge it.

dgl-bot · 2023-07-13T09:21:04Z

Commit ID: 3bead55

Build ID: 22

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

mfbalin added 2 commits June 17, 2023 20:13

optimize indptr access

88493c0

optimize indices access

2570f93

This comment was marked as outdated.

Sign in to view

adding some tests for dgl.sampling.labors

ffecadf

mfbalin force-pushed the labor_uvm_optimization branch from 0524e63 to ffecadf Compare June 18, 2023 15:18

This comment was marked as outdated.

Sign in to view

fix linting

21f505e

mfbalin force-pushed the labor_uvm_optimization branch from c3fa494 to 21f505e Compare June 18, 2023 15:25

This comment was marked as outdated.

Sign in to view

mfbalin commented Jun 18, 2023

View reviewed changes

src/array/cpu/labor_pick.h Show resolved Hide resolved

mfbalin commented Jun 18, 2023

View reviewed changes

src/array/cuda/labor_sampling.cu Show resolved Hide resolved

mfbalin changed the title ~~[Performance][CUDA] Labor UVA optimization~~ [Draft][Performance][CUDA] Labor UVA optimization Jun 18, 2023

remove unnecessary parameter

650c72c

This comment was marked as outdated.

Sign in to view

mfbalin changed the title ~~[Draft][Performance][CUDA] Labor UVA optimization~~ [Performance][CUDA] Labor UVA optimization Jun 18, 2023

This comment was marked as outdated.

Sign in to view

add labor dataloader test as well

2f7ee29

mfbalin force-pushed the labor_uvm_optimization branch from eccb068 to 2f7ee29 Compare June 19, 2023 14:29

This comment was marked as outdated.

Sign in to view

minor fix for log_size function

c3deb84

peizhou001 self-requested a review June 29, 2023 09:18

add short descriptions to tests.

009be59

mfbalin force-pushed the labor_uvm_optimization branch from 893614c to 009be59 Compare June 29, 2023 19:33

Merge branch 'master' into labor_uvm_optimization

22b5d74

frozenbugs approved these changes Jul 6, 2023

View reviewed changes

Merge branch 'master' into labor_uvm_optimization

3bead55

frozenbugs merged commit c3aea1b into dmlc:master Jul 13, 2023

Rhett-Ying added a commit that referenced this pull request Aug 10, 2023

[Performance][CUDA] Labor UVA optimization (#5885)

39b0a52

Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

DominikaJedynak pushed a commit to DominikaJedynak/dgl that referenced this pull request Mar 12, 2024

[Performance][CUDA] Labor UVA optimization (dmlc#5885)

0c0a487

Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

[Performance][CUDA] Labor UVA optimization #5885

[Performance][CUDA] Labor UVA optimization #5885

Conversation

mfbalin commented Jun 18, 2023 • edited Loading

Description

Checklist

This comment was marked as outdated.

This comment was marked as outdated.

mfbalin commented Jun 18, 2023 • edited Loading

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

dgl-bot commented Jun 29, 2023

dgl-bot commented Jun 29, 2023

Rhett-Ying commented Jun 29, 2023

dgl-bot commented Jun 29, 2023

dgl-bot commented Jun 29, 2023

dgl-bot commented Jun 29, 2023

dgl-bot commented Jun 29, 2023

Rhett-Ying commented Jun 30, 2023

dgl-bot commented Jun 30, 2023

dgl-bot commented Jul 3, 2023

dgl-bot commented Jul 3, 2023

Rhett-Ying commented Jul 3, 2023

dgl-bot commented Jul 3, 2023

frozenbugs commented Jul 6, 2023

mfbalin commented Jul 7, 2023 • edited Loading

Rhett-Ying commented Jul 10, 2023

caojy1998 commented Jul 10, 2023

mfbalin commented Jul 13, 2023

frozenbugs commented Jul 13, 2023

dgl-bot commented Jul 13, 2023

mfbalin commented Jun 18, 2023 •

edited

Loading

mfbalin commented Jun 18, 2023 •

edited

Loading

mfbalin commented Jul 7, 2023 •

edited

Loading