Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help building with MPI #35

Open
krislock opened this issue May 20, 2024 · 2 comments
Open

Help building with MPI #35

krislock opened this issue May 20, 2024 · 2 comments

Comments

@krislock
Copy link

Hello,

My apologies to trouble you with this. I've been trying to compile and run the ALPS example Knap with MPI.

Here are my steps:

  1. wget https://raw.githubusercontent.com/coin-or/coinbrew/master/coinbrew; chmod u+x coinbrew
  2. ./coinbrew --tests none build Alps@master --enable-static --disable-shared --with-mpi-cflags="$(pkg-config --cflags ompi)" --with-mpi-lflags="$(pkg-config --libs ompi)" MPICC=mpicc MPICXX=mpiCC

This results in the error:

configure:18697: checking for library MPI with separate link and compile checks
configure:18823: g++ -c -O2 -DNDEBUG  -I/opt/metis/el8/contrib/openmpi/openmpi-4.1.5-gcc-11.4.0-cuda-11.8/include/openmpi -I/opt/metis/el8/contrib/openmpi/openmpi-4.1.5-gcc-11.4.0-cuda-11.8/include/openmpi/opal/mca/event/libevent2022/libevent -I/opt/metis/el8/contrib/openmpi/openmpi-4.1.5-gcc-11.4.0-cuda-11.8/include/openmpi/opal/mca/event/libevent2022/libevent/include -pthread  conftest.cpp >&5
conftest.cpp:31:10: error: #include expects "FILENAME" or <FILENAME>
   31 | #include "#include "mpi.h""
      |          ^~~~~~~~~~~~~~

Then changed #include "#include "mpi.h"" in the Alps/configure file to #include "mpi.h", and tried again:

  1. rm -fr build/ dist/
  2. ./coinbrew --tests none build Alps@master --enable-static --disable-shared --with-mpi-cflags="$(pkg-config --cflags ompi)" --with-mpi-lflags="$(pkg-config --libs ompi)" MPICC=mpicc MPICXX=mpiCC

Now it successfully builds and installs.

Next, I tried to compile the Knap example.

  1. export LD_LIBRARY_PATH=/home/krislock/coin-or/dist/lib:$LD_LIBRARY_PATH
  2. cd build/Alps/master/examples/Knap/
  3. make

This results in the error:

for file in KnapMain.o KnapModel.o KnapNodeDesc.o KnapParams.o KnapSolution.o KnapTreeNode.o; do bla="$bla `echo $file`"; done; \
g++  -O2 -DNDEBUG  -o knap $bla `PKG_CONFIG_PATH=/home/krislock/coin-or/dist/lib/pkgconfig:/opt/metis/el8/contrib/openmpi/openmpi-4.1.5-gcc-11.4.0-cuda-11.8/lib/pkgconfig pkgconf --libs alps --static`  
KnapMain.o: In function `main.cold':
KnapMain.cpp:(.text.unlikely+0x1c): undefined reference to `vtable for AlpsKnowledgeBrokerSerial'
KnapMain.o: In function `main':
KnapMain.cpp:(.text.startup+0x3c): undefined reference to `vtable for AlpsKnowledgeBrokerSerial'
KnapMain.cpp:(.text.startup+0x46): undefined reference to `AlpsKnowledgeBrokerSerial::initializeSearch(int, char**, AlpsModel&, bool)'
KnapMain.cpp:(.text.startup+0xff): undefined reference to `AlpsKnowledgeBrokerSerial::rootSearch(AlpsTreeNode*)'
KnapMain.cpp:(.text.startup+0x3e0): undefined reference to `vtable for AlpsKnowledgeBrokerSerial'
collect2: error: ld returned 1 exit status
make: *** [Makefile:93: knap] Error 1

This implies that COIN_HAS_MPI is not set to 1. However, the dist/include/coin-or/AlpsConfig.h has #define ALPS_HAS_MPI 1. So I renamed COIN_HAS_MPI to ALPS_HAS_MPI everywhere in the Alps/examples/Knap/KnapMain.cpp file, and tried to compile again.

  1. make clean; make

Now it compiles without error. However, when I run the executable, I get a segmentation fault.

[krislock@metis Knap]$ mpirun -np 2 ./knap -param knap.par 
[metis:681199:0:681199] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x38)
[metis:681198:0:681198] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x38)
==== backtrace (tid: 681199) ====
 0 0x000000000004eb50 killpg()  ???:0
 1 0x000000000042e948 AlpsSubTree::AlpsSubTree()  ???:0
 2 0x0000000000413d9e AlpsKnowledgeBroker::AlpsKnowledgeBroker()  ???:0
 3 0x0000000000407c14 main()  ???:0
 4 0x000000000003ad85 __libc_start_main()  ???:0
 5 0x000000000040827e _start()  ???:0
=================================
==== backtrace (tid: 681198) ====
 0 0x000000000004eb50 killpg()  ???:0
 1 0x000000000042e948 AlpsSubTree::AlpsSubTree()  ???:0
 2 0x0000000000413d9e AlpsKnowledgeBroker::AlpsKnowledgeBroker()  ???:0
 3 0x0000000000407c14 main()  ???:0
 4 0x000000000003ad85 __libc_start_main()  ???:0
 5 0x000000000040827e _start()  ???:0
=================================
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node metis exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Any help you can give would be greatly appreciated!

Nathan

@krislock
Copy link
Author

I was able to get past the segmentation fault by changing AlpsKnowledgeBroker() to AlpsKnowledgeBroker(model) in Alps/src/AlpsKnowledgeBrokerMPI.h as follows:

AlpsKnowledgeBrokerMPI(int argc, 
           char* argv[], 
           AlpsModel& model,
                       bool showBanner = true)
:
AlpsKnowledgeBroker(model) 
{    
    init();
    initializeSearch(argc, argv, model, showBanner);
}

Note that this was using Alps@2.0 which does not have #include "#include "mpi.h"" in Alps/configure.

I built Alps as follows:

  1. export MPIINCDIR=<directory containing mpi.h>
  2. export MPILIB="$(pkg-config --libs ompi)"
  3. export MPICC=mpicc
  4. export MPICXX=mpiCC
  5. ./coinbrew fetch Alps@2.0
  6. Change Alps/src/AlpsKnowledgeBrokerMPI.h as mentioned above.
  7. ./coinbrew --tests none build Alps --enable-static --disable-shared

@tkralphs
Copy link
Member

Sorry for the delay and for these issues, I have not been using Alps with MPI in some time. I did try to get BLIS running with MPI fairly recently and was successful eventually, but only with an older version I believe. There is some discussion at coin-or/CHiPPS-BLIS#10. Anyway, it seems you got it working for now. I can dig into this further if you are still playing with it. Depending on what you're doing exactly, I can recommend a specific version that may work out of the box. The different version are a bit confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants