Skip to content

Verbs Build Instructions

James Dinan edited this page Dec 19, 2019 · 2 revisions

Building and Installing Sandia OpenSHMEM (SOS) on Verbs Networks:

Recent releases of libfabric include a verbs provider that can be used in conjunction with the RXM utility provider to run SOS on Verbs networks (e.g. InfiniBand or RoCE Ethernet).

OFI libfabric

Please use the ofi_rxm and verbs providers of OFI libfabric.

Libfabric may be setup to use the ofi_rxm and verbs providers via the following configuration. SOS verbs support requires libfabric v1.9.0 or newer.

$ ./configure --prefix=<libfabric_install_dir> --enable-verbs --enable-rxm --disable-psm2 --disable-sockets --disable-usnic --disable-udp --disable-rxd

where <libfabric_install_dir> is an appropriate installation path (the --disable-* flags are optional, but may help to assure SOS chooses the ofi_rxm and verbs providers).

Alternatively, SOS can be instructed to use the provider by setting the following environment variable:

SHMEM_OFI_PROVIDER="verbs;ofi_rxm"

Building SOS to use the libfabric RXM and Verbs providers

Use the following configure options to build SOS to use the libfabric ofi_rxm and verbs providers:

$ ./autogen.sh
$ ./configure --prefix=<SOS_install_dir> --with-ofi=<libfabric_install_dir> --enable-pmi-simple --enable-hard-polling --enable-ofi-mr=basic
$ make
$ make install

where <SOS_install_dir> is an appropriate installation path for SOS and <libfabric_install_dir> is the libfabric installation path with ofi_rxm and verbs providers enabled. The hard polling must be enabled through the --enable-hard-polling flag. Also, the basic memory registration for OFI should be enabled via --enable-ofi-mr=basic flag.

Compiling and running OpenSHMEM programs

The SOS build should be added to your path. If you have used the build instructions posted here, this is done by running the following command:

$ export PATH=<SOS_install_dir>/bin:$PATH

Once SOS is in your path, you can use the compiler wrapper oshcc to compile your application and the launcher wrapper oshrun to run it. Please assure you have a compatible launcher, such as Hydra 3.2 or newer.

Troubleshooting

Some versions of libfabric are known to have an issue in the memory registration cache when running multithreaded SHMEM applications. As a workaround, the following environment variable can be set:

FI_MR_CACHE_MAX_COUNT = 0

Please refer to the Troubleshooting wiki page if you encounter any issues. If your particular problem is not covered in the wiki, please submit an issue through Github.