Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance of local version of bt_band_to_tridiagonal #1144

Merged
merged 1 commit into from
May 21, 2024

Conversation

rasolca
Copy link
Collaborator

@rasolca rasolca commented May 17, 2024

Panels were not indexed correctly leading to over constraining dependencies.

Closing #1136.

@rasolca
Copy link
Collaborator Author

rasolca commented May 17, 2024

cscs-ci run

@rasolca
Copy link
Collaborator Author

rasolca commented May 17, 2024

distributed:

[0]
[0] 0.329912s 6509.26GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[1]
[1] 0.314647s 6825.07GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[2]
[2] 0.317663s 6760.25GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[3]
[3] 0.313616s 6847.49GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[4]
[4] 0.316185s 6791.85GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU

[0]
[0] 2.51927s 6819.4GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[1]
[1] 2.52648s 6799.91GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[2]
[2] 2.52548s 6802.63GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[3]
[3] 2.51763s 6823.82GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[4]
[4] 2.5293s 6792.34GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU

local:

[0]
[0] 0.312804s 6865.26GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[1]
[1] 0.322055s 6668.06GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[2]
[2] 0.320392s 6702.68GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[3]
[3] 0.318858s 6734.93GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[4]
[4] 0.320701s 6696.23GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU

[0]
[0] 2.4803s 6926.53GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[1]
[1] 2.5081s 6849.76GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[2]
[2] 2.50762s 6851.07GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[3]
[3] 2.51417s 6833.21GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[4]
[4] 2.51185s 6839.54GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU

Copy link
Collaborator

@albestro albestro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rasolca rasolca merged commit 2d91ecc into master May 21, 2024
4 checks passed
@rasolca rasolca deleted the rasolca/fix_bt1_local branch May 21, 2024 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants