Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify/decide on how to do the NVHPC based toolchains #16066

Closed
Micket opened this issue Aug 19, 2022 · 14 comments
Closed

Clarify/decide on how to do the NVHPC based toolchains #16066

Micket opened this issue Aug 19, 2022 · 14 comments
Milestone

Comments

@Micket
Copy link
Contributor

Micket commented Aug 19, 2022

Perhaps everyone already knows this and I'm just slow, but it's not clear to me how we want to deal with NVHPC, nvompi, nvompic.

Some questions I have are

  1. is it even meaningful to have a NVHPC without specifying any CUDA?
  2. if it is even is meaningful to not specify a CUDA (that i've seen some people indicate...), then, shouldn't we go for the approach that we've done with foss + fosscuda? I.e. we should only have NVHPC and then nvompi and only use Foo-1.2.3-nvompi-22.7-CUDA-11.7.0.eb ?

In my mind these toolchains aren't like GCC, there is no compatibility with CUDA to worry about, and anyone using it definitely wants to use CUDA as well, so there just isn't any real need to offer an "opt-in".
I would have just done NVHPC with a particular system CUDA, and then just build OpenMPI (depending on UCX-CUDA right away) and call that nvompic. No nvompi.

@hattom
Copy link
Contributor

hattom commented Aug 23, 2022

  • Which would be more flexible in terms of letting people modify "downstream" ECs to change the version of CUDA?
  • Which lets people have multiple CUDAs for a given toolchain version?

I think the former favours putting CUDA in the toolchain (Foo-1.2.3-nvompic-22.7.eb, with nvompic-22.7-CUDA-11.7.0.eb), the latter favours putting it in Foo-1.2.3-nvompi-22.7-CUDA-11.7.0.eb package?

  • Do either of the above considerations matter?

@boegel boegel added this to the 4.x milestone Aug 31, 2022
@Micket
Copy link
Contributor Author

Micket commented Sep 1, 2022

* Do either of the above considerations matter?

I think you are correct, though i think it's not that bad either way for downstream; if you have a good template to follow, just --try-amend and it should be really easy to switch anyway. Plus, you'd have to consider stuff from GCCcore, UCX-CUDA, UCC-CUDA, NCCL regardless, which diminishes the differences even more.

We discussed this during the zoom meeting and the conclusions were

  1. do it similar to foss + CUDA
  2. use the suffix to make it more visible as the CUDA version is very important, and it's consistent
  3. use nvompi + CUDA suffix. No nvompic
  4. introduce a top level toolchain, NVHPC+OpenMPI+flexiblas and call that nvoff (with some risk of this name being misconstrued, so we're open to suggestions. I personally found nvoflf to be worse)

I plan to make a pr with essentially:

  1. NVHPC-22.7-CUDA-11.7.0.eb
  2. nvompi-22.7-CUDA-11.7.0.eb (also depends on UCX-CUDA, UCC-CUDA, maybe NCCL(?) directly, because we can)
  3. nvoff-22.7-CUDA-11.7.0.eb

Thomas asked @SebastianAchilles what they did for BLAS in their toolchains:

Currently we still reply on imlk on the CPUs. And cuBLAS , cuFFT and cuSOLVER on GPUs.

But I think with FlexiBLAS we should have the option of using any CPU backend, so that's just better i think (I still haven't tried switching BLAS backend with flexiblas, so i'm not sure how that actually works)

@SebastianAchilles
Copy link
Member

I would also be interested in defining a NVHPC based toolchain 👍

Just for reference these are the previous attempts to create the nvompic toolchain for the 2020b and 2021a toolchain generation:

Currently at JSC we are using the toolchain nvompic. I added this toolchain definition a while ago to upstream: easybuilders/easybuild-framework#3735 nvompi was added later as well: easybuilders/easybuild-framework#3969 However in my opinion it doesn't make sense to have both definitions.

The reason for adding the the c suffix (which stands for CUDA) is the following: you can either use an external CUDA for NVHPC or the CUDA shipped with NVHPC. So, no matter if we call it nvompi or nvompic CUDA will always be pulled in either directly or indirectly.

Personally I would prefer nvompic to make point out the CUDA dependency.

For the top level toolchain NVHPC+OpenMPI+FlexiBlas+FFWT I would like to add to nvoff and nvoflf another name suggestion: nvofbf. This is mainly to be consistent with the toolchain names we already have gofbf (which is not used anymore, because from 2021a on it is called foss) and gfbf toolchain (GCC + FlexiBlas + FFTW): https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/g/gfbf/gfbf-2022a.eb

@Micket
Copy link
Contributor Author

Micket commented Sep 21, 2022

I got stuck on this because there were several things that didn't wanna build easily OpenMPI, BLIS has issues. @bartoldeman suggested some patches he wrote for CC, but i haven't had time to test anything out yet.

I have no strong opinion of it's called "nvompic" or "nvompi". The motivation for the latter was mostly to make it similar to gompi, i.e. no c suffix, since it will have a -CUDA-xx.x.x suffix anyway.
Buuuuut that's before i realized how things would pan out (been a long time since we had compilers with a suffix), so that changes the situation somewhat. Since the compiler would have the actual versionsusfix;
NVHPC-22.7-CUDA-11.7.0.eb
then the nvompi(c) (and everything using this NVHPC toolchain) would have the toolchain version specified as '22.7-CUDA-11.7.0'; they wouldn't have any versionsuffix themselves, it would still just be nvompi(c)-2022a.eb.

So.. nvompi, nvompic, meh. Sure
Also, nvofbf, sure i guess.


On a related note: i recently found that CUDA bundles nvcc. Considering we exclude all the bundled stuff it kinda makes me wonder if we even need NVHPC itself for anything..?

@Micket
Copy link
Contributor Author

Micket commented Sep 21, 2022

I've also had a second thought when encountering yet more stuff to fix for the ever broken nvcc compiler;

Is nvcc any good at building CPU code in general? It certainly doesn't seem well tested and nvidia themselves doesn't seem to bother to use nvcc to build their own bundled OpenMPI, which is quite telling.
I haven't done any benchmarking, but I wouldn't expect it to matter the slighest bit if OpenMPI is compiled with nvcc or GCC here, and I wouldn't expect nvcc to be especially good at building OpenBLAS.

So, spending time patching a build of OpenMPI or OpenBLAS or whatever just for the sake of using CC=nvcc only to produce a possibly slower version would just be counterproductive.
I basically just want to build e.g. VASP with nvcc. I don't care the slighest bit about whether all the deps and builddeps also insisted on using nvcc to build.

Perhaps

toolchainopts = {'use_gcc': True}

could both speed things up and avoid annoying patching due to the limitations of nvcc.

Maybe i'm wrong, perhaps nvcc is fricking amazing at CPU stuff as well, or maybe i should just stick to foss/2022a and use the nvcc compiler bundled with CUDA to build my ~2 top level applications that use it.

@bartoldeman
Copy link
Contributor

My opinion on this is to just use nvompi and nvofbf, and don't use nvompic, ignoring the fact that NVHPC already ships with a CUDA -- sure users can make use of it, but easyblocks often make assumptions based on consistency, e.g. they use get_software_root('CUDA') etc, so it simplifies life a lot if things built with EasyBuild use an EB CUDA module.

Now I did install an NVHPC 22.7 locally yesterday. Here are some notes:

  • Open MPI: thankfully compiles out of the box now, though still setting configopts += ' CC=pgcc CXX=pgc++ FC=pgfortran' (will edit later why exactly this is still needed)
  • FlexiBLAS: needs
local_extra_flags = "-D__ELF__"
toolchainopts = {'pic': True, 'extra_cflags': local_extra_flags, 'extra_fflags': local_extra_flags}

and

'configopts': '-DABI=Intel',

for the FlexiBLAS component (since nvfortran, like ifort, uses the g77 api for returning complex numbers in functions). Then there is also the issue that the lib is called libflexiblas_intel.so, so framework may need an adjustment for scalapack (or need to create a symlink libflexiblas.so -> libflexiblas_intel.so in postinstallcmds)

  • BLIS: this one is the most complicated one as it doesn't know nvc compiler flags.
    using preconfigopts = "sed -i 's/LINKER.*/LINKER := nvc/' common.mk &&"
    and configopts += '--complex-return=intel CC=gcc CFLAGS="-O2 -ftree-vectorize -march=native -fno-math-errno" make it compile but it's ugly, a patch to BLIS would be better (hopefully not too hard).
  • FFTW: needs to disable quad precision, same as Intel FFTW

@bartoldeman
Copy link
Contributor

found what causes some issues with Open MPI and also FFTW. This comes from libtool not supporting nvc.

for CC=nvc it sets

lt_prog_compiler_pic=' -fPIC -DPIC'
lt_prog_compiler_static=''

but for CC=pgcc it sets

lt_prog_compiler_pic='-fpic'
lt_prog_compiler_static='-Bstatic'

this caused some strange errors linking shared libraries.

@Micket
Copy link
Contributor Author

Micket commented Sep 28, 2022

For ref. spack/spack#29899

@espenfl
Copy link

espenfl commented Nov 8, 2022

Hi there. The need for MPI/OpenACC use is definitively there and getting the NVHPC toolchains in place would be awesome. What is remaining here? Just compiling with NVHPC and not using CUDA certainly has use cases. If CUDA should be automatically installed or available, when loading say a NVHPC toolchain I do not know, but maybe not to keep it as slim as possible.

@Micket
Copy link
Contributor Author

Micket commented Nov 8, 2022

What is remaining here?

Someone needs to do it.

If CUDA should be automatically installed or available, when loading say a NVHPC toolchain I do not know, but maybe not to keep it as slim as possible.

That's not a reason we actually care about. We need to keep a separate CUDA package becuase

  1. many easyblocks expects "CUDA" as a explicit dependency.
  2. we want to use UCX-CUDA, UCC-CUDA from GCCcore which is going to drag in a CUDA dep.

@hattom
Copy link
Contributor

hattom commented Nov 8, 2022

@bartoldeman did you (or someone else) mention at a previous conference call that there's something that needs to be taken into account regarding the various openmp libraries? Particularly if using gcc as the C compiler instead of nvc. Or were you reporting above that it's possible to build everything with nvc (just that some things require pretending that it's pgc)?

@espenfl
Copy link

espenfl commented Nov 17, 2022

Thanks for the update.

Someone needs to do it.

Maybe we can contribute. I will check.

That's not a reason we actually care about. We need to keep a separate CUDA package becuase

Good, so can we then consider the CUDA side of this settled then? Since we anyway need CUDA as a separate package and that there are certainly use cases of using the nvhpc compiler without CUDA, users would have to specify, meaning we also make nvompic etc. where these CUDA packages are included?

@Micket
Copy link
Contributor Author

Micket commented Nov 27, 2022

I dusted off my old easyconfigs, adding all the stuff @bartoldeman mentioned into #16724

@Micket
Copy link
Contributor Author

Micket commented Jan 14, 2023

We have a toolchain now (edit: cfr. #16724)

@Micket Micket closed this as completed Jan 14, 2023
@boegel boegel modified the milestones: 4.x, 4.7.0 Jan 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants