Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{perf}[gompi/2023b] Score-P v8.4 w/ CUDA 12.4.0 #20146

Merged

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Mar 19, 2024

(created using eb --new-pr)

Requires:

If the build fails with

configure: WARNING: nvcc -ccbin mpicxx compilation failed. Disabling CUDA support.

the cause is very likely a missing include-fixed folder for GCC 13, see easybuilders/easybuild-easyblocks#3254. So either rename the include-fixed folder renamed by easybuild or regenerate it with $EBROOTGCC/libexec/gcc/x86_64-pc-linux-gnu/13.2.0/install-tools/mkheaders

@Thyre
Copy link
Contributor

Thyre commented Mar 19, 2024

Score-P v8.4 has released with SHA256: 7bbde9a0721d27cc6205baf13c1626833bcfbabb1f33b325a2d67976290f7f8a

Maybe, we can directly use the newest release instead of creating two recipes for the two versions.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
i8001 - Linux Rocky Linux 8.7 (Green Obsidian), x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 545.23.08, Python 3.8.13
See https://gist.github.com/Flamefire/573cc81749d2902ccbfc4f4d7a74d5e0 for a full test report.

@Flamefire Flamefire changed the title {perf}[gompi/2023b] Score-P v8.3 w/ CUDA 12.4.0 {perf}[gompi/2023b] Score-P v8.4 w/ CUDA 12.4.0 Mar 19, 2024
@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
i8001 - Linux Rocky Linux 8.7 (Green Obsidian), x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 545.23.08, Python 3.8.13
See https://gist.github.com/Flamefire/ba67de400919dc6a6e38f3aa0bc5f1a8 for a full test report.

@casparvl
Copy link
Contributor

Test report by @casparvl
FAILED
Build succeeded for 2 out of 3 (1 easyconfigs in total)
tcn1.local.snellius.surf.nl - Linux RHEL 8.6, x86_64, AMD EPYC 7H12 64-Core Processor, Python 3.6.8
See https://gist.github.com/casparvl/4d88c97cb88fbb32e1a7367e1217f513 for a full test report.

@casparvl
Copy link
Contributor

Hm, not sure if this needs an actual GPU node, I'll try rebuilding on one...

@casparvl
Copy link
Contributor

Test report by @casparvl
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
gcn6.local.snellius.surf.nl - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 545.23.08, Python 3.6.8
See https://gist.github.com/casparvl/d931df82a4b888163a7213f45718577d for a full test report.

@Thyre
Copy link
Contributor

Thyre commented Mar 19, 2024

Hm, not sure if this needs an actual GPU node, I'll try rebuilding on one...

Score-P shouldn't require any GPU being present when building. From the logs, it looks like nvcc -ccbin mpicxx failed.

configure: WARNING: nvcc -ccbin mpicxx compilation failed. Disabling CUDA support.
configure: error: could not fulfill requested support for CUDA.
configure: error: ./configure failed for build-mpi

Unfortunately, I cannot say what exactly did go wrong without looking at build-mpi/config.log. The provided configure options look reasonable.

@Flamefire
Copy link
Contributor Author

No the issue is a "broken" GCC installation: GCC13 is just compatible with CUDA and requires fixed includes. I added that information to the description.

@Thyre
Copy link
Contributor

Thyre commented Mar 19, 2024

This makes sense, since OpenMPI is built with GCC 13. configure would probably fail without MPI later on as well when checking CUDA just with the backend compiler.

@casparvl
Copy link
Contributor

Ah clear, thanks. Ok... I'll get back to this tomorrow :)

@satishskamath
Copy link
Contributor

UCX-CUDA for the 2023b toolchain contains CUDA 12.3.0 as a dependency. Any reasons for not using that here and using 12.4.0?

@casparvl
Copy link
Contributor

Test report by @casparvl
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
tcn1.local.snellius.surf.nl - Linux RHEL 8.6, x86_64, AMD EPYC 7H12 64-Core Processor, Python 3.6.8
See https://gist.github.com/casparvl/4cc37e81a7cff0519f70ac45f933647c for a full test report.

Copy link
Contributor

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

@casparvl casparvl added this to the release after 4.9.0 milestone Mar 19, 2024
@casparvl
Copy link
Contributor

Going in, thanks @Flamefire!

@casparvl casparvl merged commit b75a78d into easybuilders:develop Mar 19, 2024
9 checks passed
@Flamefire
Copy link
Contributor Author

UCX-CUDA for the 2023b toolchain contains CUDA 12.3.0 as a dependency. Any reasons for not using that here and using 12.4.0?

Because CUDA 12.3.0 does not support GCC 13. That EC should be removed.

@Flamefire
Copy link
Contributor Author

@casparvl Can you also take a look at the related PR #19524

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants