Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add patch for BLIS to fix auto-detection of POWER #15826

Merged

Conversation

Flamefire
Copy link
Contributor

(created using eb --new-pr)

@jfgrimm
Copy link
Member

jfgrimm commented Jul 7, 2022

Test report by @jfgrimm
FAILED
Build succeeded for 4 out of 11 (11 easyconfigs in total)
node022.pri.viking.alces.network - Linux CentOS Linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/55d27543e3a23b0d02b2d647bb0b6e0d for a full test report.

@boegelbot

This comment was marked as outdated.

@Flamefire Flamefire force-pushed the 20220707144323_new_pr_BLIS081 branch 2 times, most recently from f61ed6a to 8a0c694 Compare July 7, 2022 14:46
@Flamefire Flamefire force-pushed the 20220707144323_new_pr_BLIS081 branch from 8a0c694 to 5e50cd1 Compare July 7, 2022 15:01
@jfgrimm
Copy link
Member

jfgrimm commented Jul 7, 2022

Test report by @jfgrimm
FAILED
Build succeeded for 6 out of 10 (10 easyconfigs in total)
node022.pri.viking.alces.network - Linux CentOS Linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/d3b672008e0c9fbc04c2c60da0b51b4c for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 10 out of 10 (10 easyconfigs in total)
taurusi8018 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/ca6b0ac0bd3f22f4d795911c428571e9 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 10 out of 10 (10 easyconfigs in total)
taurusa11 - Linux CentOS Linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz (broadwell), 3 x NVIDIA GeForce GTX 1080 Ti, 460.32.03, Python 2.7.5
See https://gist.github.com/23f746f0845ef4d83a822b0fac88b40f for a full test report.

@jfgrimm
Copy link
Member

jfgrimm commented Jul 7, 2022

@boegelbot please test @ jsc-zen2
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@jfgrimm: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=15826 EB_ARGS= /opt/software/slurm/bin/sbatch --job-name test_PR_15826 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 1376

Test results coming soon (I hope)...

- notification for comment with ID 1178221339 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 10 out of 10 (10 easyconfigs in total)
jsczen2c1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/4c75c937ba0f2dd60a8a1f3c750193b7 for a full test report.

@Flamefire
Copy link
Contributor Author

@jfgrimm sorry for the failures, seemingly the versions are not really incremental so some patches I thought would still apply didn't. As can be seen by recent test reports this is now fixed.

@jfgrimm
Copy link
Member

jfgrimm commented Jul 8, 2022

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@jfgrimm: Request for testing this PR well received on login1

PR test command 'EB_PR=15826 EB_ARGS= /opt/software/slurm/bin/sbatch --job-name test_PR_15826 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8844

Test results coming soon (I hope)...

- notification for comment with ID 1178658211 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@jfgrimm
Copy link
Member

jfgrimm commented Jul 8, 2022

I'm still seeing test failures on my machine:

% blis_<dt><op>_<params>_<stor>      m     k   gflops   resid      result                                               
blis_sgemmt_lnn_rrr                100   100    37.48   5.39e-02   FAILURE                                              
blis_sgemmt_unn_rrr                100   100    40.96   6.05e-02   FAILURE

although I think that's unrelated to the changes in this PR, so shouldn't block this

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 10 out of 10 (10 easyconfigs in total)
cns1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/a1b1b3aedc25642e434d2f7266a838ac for a full test report.

Copy link
Member

@jfgrimm jfgrimm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jfgrimm
Copy link
Member

jfgrimm commented Jul 8, 2022

Going in, thanks @Flamefire!

@jfgrimm jfgrimm merged commit cb610b5 into easybuilders:develop Jul 8, 2022
@Flamefire Flamefire deleted the 20220707144323_new_pr_BLIS081 branch July 8, 2022 14:11
@Flamefire
Copy link
Contributor Author

Seems like I introduced a bug here for 2.2 and 3.0: They don't support POWER10 so the code doesn't compile on any PPC. Fix included in #15889

This still doesn't fully solve the BLIS build on PPC due to segfaults in 0.9.0 (flame/blis#621) and timeouts during tests in 3.0.1

@boegel boegel changed the title Fix BLIS build on PPC add patch for BLIS to fix auto-detection of POWER Aug 3, 2022
@boegel boegel added bug fix and removed change labels Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants