Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMPI-4.0.3-GCC-9.3.0 compilation error on rhel 7 #11939

Closed
golivag opened this issue Dec 30, 2020 · 10 comments
Closed

OpenMPI-4.0.3-GCC-9.3.0 compilation error on rhel 7 #11939

golivag opened this issue Dec 30, 2020 · 10 comments
Milestone

Comments

@golivag
Copy link

golivag commented Dec 30, 2020

I'm trying to build OpenMPI-4.0.3-GCC-9.3.0 on red hat enterprise linux 7 for power (ppc64le), but it fails and I get the following error in the log fie:

configure: WARNING: Unfortunately, libfabric links to both libnl and libnl-3.
configure: WARNING: This is a configuration that is known to cause run-time crashes.
configure: WARNING: This is an error in libfabric (not Open MPI).
configure: WARNING: Open MPI will therefore skip using libfabric.
configure: WARNING: OFI libfabric support requested (via --with-ofi or --with-libfabric), but not found.
configure: error: Cannot continue.

I am grateful for any clue.

@boegel boegel added this to the 4.x milestone Dec 30, 2020
@boegel
Copy link
Member

boegel commented Dec 30, 2020

@golivag The problem seems to be with the libfabric installation, so either you try to change something there to avoid that it links with obth libnl and libnl-3 (not sure how that happened), or you tweak the OpenMPI easyconfig file to build without libfabric support, by including this:

configopts = "--without-ofi"

@golivag
Copy link
Author

golivag commented Dec 30, 2020

Hi @boegel, thank you very much for your prompt answer. Indeed it was a problem with libfabric, and as suggested here I installed libnl3-devel, rebuilt libfabric and then I was able to compile OpenMPI without problems.
Thanks again for your valuable help!

@golivag golivag closed this as completed Dec 30, 2020
@fmgvalente
Copy link

I've also hit this issue. Installing libnl3-devel also prevented libfabric from linking with libnl v1, solving my issue building openmpi.
It was apparently triggered by having both libnl v1 an 3 installed, but only the devel packages for v1.

@boegel
Copy link
Member

boegel commented Jan 5, 2021

Let's re-open this and see if we can change something in the libfabric easyconfigs to prevent this from happening...

Any suggestions @Micket?

@boegel boegel reopened this Jan 5, 2021
@boegel boegel modified the milestones: 4.x, next release (4.3.3?) Jan 5, 2021
@Micket
Copy link
Contributor

Micket commented Jan 6, 2021

I had neither nl1 nor nl3 devel packages installed, and somehow my libfabric builds v3 support. I don't know anything about this package at all.

@branfosj
Copy link
Member

branfosj commented Jan 7, 2021

I've done some poking and found:

OpenMPI fails if it finds a mix of libnl v1 and v3 in the items it is compiling against, as this is known to cause issues. See open-mpi/ompi#3989 - OpenMPI is following the 'fail early' approach.

I also have neither the nl1 nor nl3 devel pacakges installed (this is on CentOS 7). The relevant section in my build of libfabric shows:

configure: checking for libnl3
checking for libnl3 prefix... /usr
checking for /usr/include/libnl3... not found
configure: checking for libnl
looking for header without includes
checking netlink/netlink.h usability... no
checking netlink/netlink.h presence... no
checking for netlink/netlink.h... no
checking netlink/netlink.h usability... no
checking netlink/netlink.h presence... no
checking for netlink/netlink.h... no
configure: usnic provider: disabled

I.e. libfabric directly compiles against neither nl1 nor nl3 - as we'd expect due to the lack of devel packages.

However,

$ ldd /lib64/libibverbs.so.1
        linux-vdso.so.1 =>  (0x00007fff2f3ba000)
        libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00007fd2d611d000)
        libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00007fd2d5efc000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd2d5ce0000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fd2d5adc000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fd2d570e000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fd2d540c000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fd2d65a3000)

I.e. my libibverbs is built against nl3.

@branfosj
Copy link
Member

If you hit this issues, there are three possible solutions:

  1. Add the libnl3-devel package to the OS
  2. Build OpenMPI without libfabric support: configopts = "--without-ofi" in the OpenMPI easyconfig
  3. Build libfabric without the usNIC provider: configopts = "--disable-usnic" in the libfabric easyconfig

@ocaisa
Copy link
Member

ocaisa commented May 12, 2021

Personally, I like Option 3. usNIC is probably niche anyway (at least I hadn't head about it until today). In the easyconfig we can leave a comment similar to

# If you require usNIC support you will need to uncomment the following lines:
# osdependencies = [...]
# configopts = "--enable-usnic"

Actually, we'd have to tweak the easyblock to disable the option by default so we can add the OS dependency check there.

@ocaisa
Copy link
Member

ocaisa commented May 12, 2021

Hopefully fixed in #12854 , @golivag can you check (rebuild libfabric from that PR and then reattempt the install)?

Actually it is enough just to do eb --stop configure <easyconfig from PR> and then grep provider from the log file. If you compare that to your existing installation, you should see the usNIC provider gone.

@boegel
Copy link
Member

boegel commented May 26, 2021

I'll go ahead and close this under the assumption that it's fixed by the changes in #12854.
If not, we can re-open this and follow up, so please don't hesitate to report back @golivag!

@boegel boegel closed this as completed May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants