Support ExtendedCpuGpuSplit pattern, and more #1

ylee88 · 2024-11-08T06:01:30Z

This PR contains the following:

pushTile support for the ExtendedCpuGpuSplit pattern.
Updated Fortran TaskFunction with subroutine wrappers.

Profiling results from the Nsight Systems reports that a huge amount (~90%) of timing is consumed by `cuMemHostAlloc`, when a slice of array (e.g., `Uin(:, :, :, :, n)`) is passed to the SFR's argument. This commit avoids `cuMemHostAlloc` in the profiling results by introducing "wrapper" subroutines for each SFRs. However, the overall performance results remain the same, even though the Nsight Systems doesn't report `cuMemHostAlloc`. Perhaps it was an incorrect profiling results, but I push this commit to investigate this further.

kweide · 2024-11-22T21:27:22Z

In the meantime, support for the ExtendedCpuGpuSplit has already been added to main (by cherry-picking).
Therefore, there is less left to do for this PR - merging it will essentially just

update Fortran TaskFunctions with subroutine wrappers.

ylee88 · 2024-11-27T21:32:59Z

@kweide,

Can we revert the cherry picking and merge this PR instead? The subroutine wrappers should have no or minimal impacts on the performance, and they relieve the cuMemHostAlloc issues effectively, at least for the profiling results. And it is relatively easy to delete the subroutine wrappers in the future.

I prefer to maintain the main branch as clean as possible to track every commit in a dedicated PR.

Those cherry-pick commits should not have been done on the main branch. The shell command I am using to restore the main branch to the desired state is git restore --staged --worktree --source=1f5a60a88315ca854516b254d5796b8e81d8d3f6 :/ A proper merge commit, merging the GitHub PR #1, is to follow soon. The changes undone here will then be applied once more, together with others.

kweide

Since you have indicated that the extra wrapping can be easily removed again - in case it does more harm or good - I approve.

ylee88 added 9 commits October 6, 2024 17:57

ExtCpuGpuSplit case

107c748

fix a missing line

5e9ec79

better handling for data receiver's prototype

2f9cab6

Merge branch 'master' into ipdps2025

98d69f8

Merge branch 'master' into ipdps2025

5682652

flake8

0fbd156

more information

45b5bdb

update REF files with wrapper functions

e1f3434

ylee88 requested a review from kweide November 8, 2024 06:02

update comments

513f2bd

kweide approved these changes Nov 27, 2024

View reviewed changes

kweide merged commit 262c450 into main Nov 27, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ExtendedCpuGpuSplit pattern, and more #1

Support ExtendedCpuGpuSplit pattern, and more #1

ylee88 commented Nov 8, 2024

kweide commented Nov 22, 2024

ylee88 commented Nov 27, 2024

kweide left a comment

Support ExtendedCpuGpuSplit pattern, and more #1

Support ExtendedCpuGpuSplit pattern, and more #1

Conversation

ylee88 commented Nov 8, 2024

kweide commented Nov 22, 2024

ylee88 commented Nov 27, 2024

kweide left a comment

Choose a reason for hiding this comment