Name	Name	Last commit message	Last commit date
parent directory ..
Fortran_samples	Fortran_samples
JAX_FFT	JAX_FFT
extra_bootstraps	extra_bootstraps
samples	samples
README.md	README.md

cuFFTMp C++ and Fortran Code Samples

Requirements

HPC SDK 23.3 and up
CUDA 11.0 and up
A system with at least two Hopper (SM90), Ampere (SM80) or Volta (SM70) GPU.
- The c2c_pencils and r2c_c2r_pencils samples require at least 4 GPUs.

Please see the "Hardware and software requirements" sections of the documentation for the full list of requirements.

Quick start for C++ samples

The following environmental variables need to be defined in order to build and run the samples, for example:

MPI_HOME=/hpc_sdk/Linux_x86_64/.../comm_libs/hpcx/latest/ompi, the path to your MPI installation and should contain a lib and include folder
CUFFT_LIB=/hpc_sdk/Linux_x86_64/.../math_libs/lib64/, where libcufftMp.so is located
CUFFT_INC=/hpc_sdk/Linux_x86_64/.../math_libs/include/cufftmp, where all the cuFFT and cuFFTMp headers files are located
NVSHMEM_LIB=/hpc_sdk/Linux_x86_64/.../comm_libs/nvshmem/lib, where all the NVSHMEM libraries, such as libnvshmem_host.so, are located
NVSHMEM_INC=/hpc_sdk/Linux_x86_64/.../comm_libs/nvshmem/include, where all the NVSHMEM headers, such as nvshmem.h, are located

Note that cuFFTMp requires a specific version of NVSHMEM, as indicated here. If HPC SDK contains multiple versions of NVSHMEM, compatible versions are available in /hpc_sdk/Linux_x86_64/.../lib/compat/ and /hpc_sdk/Linux_x86_64/.../include/compat/. Note that NVSHMEM can also be downloaded individually from here.

As cuFFTMp is released in HPC SDK 22.3 and up, to build and run the samples or your applications with cuFFTMp it is highly recommended to have $MPI_HOME, $CUFFT_LIB, $CUFFT_INC, $NVSHMEM_LIB, and $NVSHMEM_INC all pointing to the same HPC SDK version.

If you have to use an older version of HPC SDK (21.9 or 21.11), you can find the early-access version of cuFFTMp in cuFFTMP EA.

Then build and run the C2C sample by:

$ cd samples/c2c
$ make run
Hello from rank 1/2 using GPU 1 transform of size 16 x 16 x 16, local size 8 x 16 x 16
Hello from rank 0/2 using GPU 0 transform of size 16 x 16 x 16, local size 8 x 16 x 16
Shuffled (Y-Slabs) GPU data, global 3D index [0 8 0], local index 0, rank 1 is (-13.323235,-48.004234)
[...]
Shuffled (Y-Slabs) GPU data, global 3D index [0 0 9], local index 9, rank 0 is (15.618601,-9.228624)
Relative Linf error on rank 0, 3.226381e-07
Relative Linf error on rank 1, 3.109569e-07
PASSED on rank 1
PASSED on rank 0

If you see PASSED, the test ran successfully.

You can repeat the same procedure for the other samples
- samples/c2c_pencils
- samples/c2c_no_descriptors
- samples/r2c_c2r
- samples/r2c_c2r_shared_scratch
- samples/r2c_c2r_pencils
- samples/r2c_c2r_slabs_GROMACS
- samples/r2c_c2r_no_descriptors
- samples/reshape

Fortran samples

A Fortran wrapper library for cuFFTMp is provided in Fortran_wrappers_nvhpc subfolder. The wrapper library will be included in HPC SDK 22.5 and later. The Fortran samples can be built and run similarly with make run in each of the directories:

Fortran_samples/c2c
Fortran_samples/c2c_no_descriptors
Fortran_samples/c2c_pencils
Fortran_samples/r2c_c2r
Fortran_samples/r2c_c2r_no_descriptors
Fortran_samples/r2c_c2r_pencils
Fortran_samples/r2c_c2r_shared_scratch
Fortran_samples/reshape

Note: Some Fortran samples that use NVSHMEM APIs to manage memory, including c2c_no_descriptors, r2c_c2r_no_descriptros and r2c_c2r_shared_scratch will fail to build using HPC SDK 23.9. It is a known issue. The workaround is to use a different version of HPC SDK, preferably the latest (23.11).

General tips

No Infiniband?

Those samples use NVSHMEM. If the system doesn't have Infiniband, you can use

NVSHMEM_REMOTE_TRANSPORT=none

to avoid Infiniband initialization-related errors. This will then fallback to p2p (single-node) only.

MPI non compatible with provided bootstrap in HPC SDK?

In case a custom MPI, other than the MPI implementations provided in HPC SDK, is used, the bootstrapping plugin may fail with an error such as

src/bootstrap/bootstrap_loader.cpp:46: NULL value Bootstrap unable to load 'nvshmem_bootstrap_mpi.so'
libmpi.so.40: cannot open shared object file: No such file or directory
src/bootstrap/bootstrap.cpp:26: non-zero status: -1 bootstrap_loader_init returned error
src/init/init.cpp:90: non-zero status: 7 bootstrap_init failed
src/init/init.cpp:501: non-zero status: 7 nvshmem_bootstrap failed

indicating it cannot load libmpi.so.40, most likely because a non-compatible version of MPI is used to link with the nvshmem bootstrapping library.

In this case a custom bootstrap library can be built to enable users to use its own MPI implementation. We include an extra_bootstraps folder in the samples to help creating the custom bootstrap library. Find more information at the "Bootstrapping Mechanism" session of the documentation.

Container

HPC SDK containers contain all the required dependencies. For instance,

docker pull nvcr.io/nvidia/nvhpc:23.3-devel-cuda_multi-ubuntu20.04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuFFTMp

cuFFTMp

README.md

cuFFTMp C++ and Fortran Code Samples

Requirements

Quick start for C++ samples

Fortran samples

General tips

No Infiniband?

MPI non compatible with provided bootstrap in HPC SDK?

Container

Files

cuFFTMp

Directory actions

More options

Directory actions

More options

Latest commit

History

cuFFTMp

Folders and files

parent directory

README.md

cuFFTMp C++ and Fortran Code Samples

Requirements

Quick start for C++ samples

Fortran samples

General tips

No Infiniband?

MPI non compatible with provided bootstrap in HPC SDK?

Container