Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ExtendedCpuGpuSplit pattern, and more #1

Merged
merged 10 commits into from
Nov 27, 2024
Merged

Support ExtendedCpuGpuSplit pattern, and more #1

merged 10 commits into from
Nov 27, 2024

Conversation

ylee88
Copy link
Collaborator

@ylee88 ylee88 commented Nov 8, 2024

This PR contains the following:

  • pushTile support for the ExtendedCpuGpuSplit pattern.
  • Updated Fortran TaskFunction with subroutine wrappers.

ylee88 added 9 commits October 6, 2024 17:57
Profiling results from the Nsight Systems reports that
a huge amount (~90%) of timing is consumed by
`cuMemHostAlloc`, when a slice of array (e.g., `Uin(:, :, :, :, n)`)
is passed to the SFR's argument.

This commit avoids `cuMemHostAlloc` in the profiling results
by introducing "wrapper" subroutines for each SFRs.
However, the overall performance results remain the same,
even though the Nsight Systems doesn't report `cuMemHostAlloc`.
Perhaps it was an incorrect profiling results, but I push this commit
to investigate this further.
@ylee88 ylee88 requested a review from kweide November 8, 2024 06:02
@kweide
Copy link
Member

kweide commented Nov 22, 2024

In the meantime, support for the ExtendedCpuGpuSplit has already been added to main (by cherry-picking).
Therefore, there is less left to do for this PR - merging it will essentially just

  • update Fortran TaskFunctions with subroutine wrappers.

@ylee88
Copy link
Collaborator Author

ylee88 commented Nov 27, 2024

@kweide,

Can we revert the cherry picking and merge this PR instead? The subroutine wrappers should have no or minimal impacts on the performance, and they relieve the cuMemHostAlloc issues effectively, at least for the profiling results. And it is relatively easy to delete the subroutine wrappers in the future.

I prefer to maintain the main branch as clean as possible to track every commit in a dedicated PR.

kweide added a commit that referenced this pull request Nov 27, 2024
Those cherry-pick commits should not have been done on the main branch.
The shell command I am using to restore the main branch to the desired state is

  git restore --staged --worktree --source=1f5a60a88315ca854516b254d5796b8e81d8d3f6 :/

A proper merge commit, merging the GitHub PR #1, is to follow soon.
The changes undone here will then be applied once more, together with others.
Copy link
Member

@kweide kweide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have indicated that the extra wrapping can be easily removed again - in case it does more harm or good - I approve.

@kweide kweide merged commit 262c450 into main Nov 27, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants