-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TFactor: bulkerify GEMV (both MC and GPU) #1214
base: master
Are you sure you want to change the base?
Conversation
6558780
to
bf17fcc
Compare
37ea75b
to
2845339
Compare
cscs-ci run just to check if there is any major problems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Please name new functions in snake case, and rename existing internal functions if possible.
matrix::Matrix<T, Device::GPU> ws_T({nworkspaces * nrefls_step, nrefls_step}, | ||
{nrefls_step, nrefls_step}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All Ts are allocated at scheduling. Better reuse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a few questions, nothing blocking.
05dc48d
to
f83f317
Compare
|
||
namespace dlaf::factorization::internal { | ||
|
||
inline size_t getTFactorNWorkers() noexcept { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this to be snake case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see 36f179d
0397c9e
to
d370894
Compare
cscs-ci run |
Converted to draft in order to prevent merging, since I'm still doing some checks. |
Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>
taus are set for gemv, but then full tile is reduced, hence also the diag with taus.
tile_t might get moved by first worker before any other work is able to use its size
f0638e1
to
5997e52
Compare
in case of odd division of tiles, it might end up being selected but not used.
cscs-ci run |
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #1214 +/- ##
==========================================
- Coverage 94.38% 94.02% -0.37%
==========================================
Files 154 155 +1
Lines 9374 9461 +87
Branches 1160 1163 +3
==========================================
+ Hits 8848 8896 +48
- Misses 320 340 +20
- Partials 206 225 +19 ☔ View full report in Codecov by Sentry. |
add assert also for gpu code that calls add manually
2f90f23
to
d199b88
Compare
In this PR we try to increase parallelisation of GEMV step of TFactor, which is by construction serial (all results goes into the same block tile), by using workspaces for storing partial results and then reducing them before the final TRMV step.
Algorithmically, the main change is that
stepGEMV
loop has been replaced with a singlestepGEMVAll
, and the concept has been applied in a similar way for both backends, MC and GPU:pika::bulk
splits input tiles over multiple task, each one stores the partial results in their workspace and just one at the end does the reduction.In order to implement this solution, workspaces for intermediate results have been added (there is another option which does not require any additional workspace nor reduction, and that we might explore for MC in another PR).
TODO:
Close #798.
EDIT: since this is conceptually similar to #798 and it is going to be closed as soon as this gets merged, I migrated here the doc fixes happened there.