Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

computeTFactor parallelise computation with bulk (MC Local and Distributed) #798

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

albestro
Copy link
Collaborator

@albestro albestro commented Feb 14, 2023

This PR improves computeTFactor computation by parallelising the computation that previously was serialised because of the reduction over a single tile.

The algorithm implementation has been split in two variants:

  • Backend::MC now has the bulk-based implementation
  • Backend::GPU still use the previous version of the code
    This required a couple of changes to the interface structure of the algorithm, and I took the chance to fix also some problem with the documentation.

Note: Look at first three commits to see the main implementation change in a more "isolated" way from the rest.

TODO

  • Split Distributed MC/GPU implementations too
  • Make nthreads used for bulk tunable
  • Complete documentation fixes
  • Code cleanup

@albestro albestro added this to the Optimizations milestone Feb 14, 2023
@albestro albestro self-assigned this Feb 14, 2023
@albestro albestro force-pushed the alby/tfactor-bulk branch 2 times, most recently from 1af3047 to c2d5c50 Compare February 17, 2023 07:49
@albestro albestro changed the title computeTFactor parallelise computation with bulk (MC Local) computeTFactor parallelise computation with bulk (MC Local and Distributed) Feb 17, 2023
@albestro albestro force-pushed the alby/tfactor-bulk branch from 0d944b5 to 26d72aa Compare April 18, 2023 10:18
@albestro albestro marked this pull request as ready for review April 18, 2023 10:45
@albestro albestro requested review from rasolca and msimberg April 18, 2023 10:46
include/dlaf/factorization/qr/t_factor_impl.h Outdated Show resolved Hide resolved
@rasolca
Copy link
Collaborator

rasolca commented Apr 19, 2023

cscs-ci run

@rasolca
Copy link
Collaborator

rasolca commented Apr 19, 2023

The eigensolver test is deadlocking 😱

@albestro albestro marked this pull request as draft April 19, 2023 16:08
@albestro
Copy link
Collaborator Author

albestro commented Apr 19, 2023

cscs-ci run

checking if the workaround works also in the CI.

In case, I will document more the case and use it to investigate deeper the problem.

albestro added a commit that referenced this pull request Nov 27, 2024
albestro added a commit that referenced this pull request Nov 27, 2024
albestro added a commit that referenced this pull request Nov 27, 2024
albestro added a commit that referenced this pull request Dec 2, 2024
albestro added a commit that referenced this pull request Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants