Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when running the example **backward_facing_step_3d** #382

Open
xiangbei007 opened this issue Oct 19, 2023 · 6 comments
Open

Errors when running the example **backward_facing_step_3d** #382

xiangbei007 opened this issue Oct 19, 2023 · 6 comments

Comments

@xiangbei007
Copy link


@xiangbei007
Copy link
Author

1、when running this example,I got errors:

`/bin/sh: 1: [: x: unexpected operator
**********Calling flredecomp in parallel with verbose log output enabled:
mpiexec -n 8 /home/wangbo/01_fluidity-main/examples/backward_facing_step_3d/../../bin/flredecomp -i 1 -o 8 -v -l backward_facing_step_3d backward_facing_step_3d_flredecomp
No protocol specified

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[45886,1],0]
Exit code: 15

make: *** [Makefile:10: run] Error 15`

2、The content of "flredecomp.err-0":

The target number of processes must be equal or less than the number of processes currently running. *** ERROR *** Source location: (Flredecomp.F90, 120) Error message: Running on insufficient processes. application called MPI_Abort(MPI_COMM_WORLD, 15) - process 0 [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=15 : system msg for write_line failure : Bad file descriptor
3、The content of "flredecomp.log-0":
In flredecomp /home/wangbo/fluidity/lib/python3.8/site-packages/fluidity/state_types.py fluidity.state_types imported successfully; location: Input base name: backward_facing_step_3d Output base name: backward_facing_step_3d_flredecomp Input number of processes: 1 Target number of processes: 8 Job number of processes: 1
4、The source code mentioned in step2,which is the part of "Flredecomp.F90":
! Input check if(input_nprocs < 0) then FLExit("Input number of processes cannot be negative!") else if(target_nprocs < 0) then FLExit("Target number of processes cannot be negative!") else if(input_nprocs > nprocs) then ewrite(-1, *) "The input number of processes must be equal or less than the number of processes currently running." FLExit("Running on insufficient processes.") else if(target_nprocs > nprocs) then ewrite(-1, *) "The target number of processes must be equal or less than the number of processes currently running." FLExit("Running on insufficient processes.") end if

it means target_nprocs > nprocs, but make run NPROCS=8. nprocs should be 8,but why is it now 1, causing the program to fail to run successfully?
How can i solve this problem
The following content is the situation of fluidity.
Revision: : (debugging) Compile date: Oct 19 2023 14:28:11 OpenMP Support no Adaptivity support yes 2D adaptivity support yes 3D MBA support no CGAL support no MPI support yes Double precision yes NetCDF support yes Signal handling support yes Stream I/O support yes PETSc support yes Hypre support no ARPACK support no Python support yes Numpy support yes VTK support yes Zoltan support yes Memory diagnostics yes FEMDEM support no Hyperlight support no libsupermesh support no

@xiangbei007
Copy link
Author

1、when running this example,I got errors:
**/bin/sh: 1: [: x: unexpected operator
**********Calling flredecomp in parallel with verbose log output enabled:
mpiexec -n 8 /home/wangbo/01_fluidity-main/examples/backward_facing_step_3d/../../bin/flredecomp -i 1 -o 8 -v -l backward_facing_step_3d backward_facing_step_3d_flredecomp
No protocol specified

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[45886,1],0]
Exit code: 15

make: *** [Makefile:10: run] Error 15**

2、The content of "flredecomp.err-0":
** The target number of processes must be equal or less than the number of processes currently running.
*** ERROR ***
Source location: (Flredecomp.F90, 120)
Error message: Running on insufficient processes.
application called MPI_Abort(MPI_COMM_WORLD, 15) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=15
:
system msg for write_line failure : Bad file descriptor
**
3、The content of "flredecomp.log-0":
** In flredecomp
/home/wangbo/fluidity/lib/python3.8/site-packages/fluidity/state_types.py
fluidity.state_types imported successfully; location:
Input base name: backward_facing_step_3d
Output base name: backward_facing_step_3d_flredecomp
Input number of processes: 1
Target number of processes: 8
Job number of processes: 1
**
4、The source code mentioned in step2,which is the part of "Flredecomp.F90":
** ! Input check
if(input_nprocs < 0) then
FLExit("Input number of processes cannot be negative!")
else if(target_nprocs < 0) then
FLExit("Target number of processes cannot be negative!")
else if(input_nprocs > nprocs) then
ewrite(-1, ) "The input number of processes must be equal or less than the number of processes currently running."
FLExit("Running on insufficient processes.")
else if(target_nprocs > nprocs) then
ewrite(-1, ) "The target number of processes must be equal or less than the number of processes currently running."
FLExit("Running on insufficient processes.")
end if

it means target_nprocs > nprocs, but make run NPROCS=8. nprocs should be 8,but why is it now 1, causing the program to fail to run successfully?
How can i solve this problem
The following content is the situation of fluidity.
Revision: : (debugging)
Compile date: Oct 19 2023 14:28:11
OpenMP Support no
Adaptivity support yes
2D adaptivity support yes
3D MBA support no
CGAL support no
MPI support yes
Double precision yes
NetCDF support yes
Signal handling support yes
Stream I/O support yes
PETSc support yes
Hypre support no
ARPACK support no
Python support yes
Numpy support yes
VTK support yes
Zoltan support yes
Memory diagnostics yes
FEMDEM support no
Hyperlight support no
libsupermesh support no

@stephankramer
Copy link
Contributor

I suspect that your fluidity has been built against a different MPI library than the one associated with the mpiexec in your path. Can you give the output of:

ldd /home/wangbo/01_fluidity-main/bin/flredecomp

and

which mpiexec

and if on Ubuntu (or some other Debian derivative):

update-alternatives --display mpirun

@xiangbei007
Copy link
Author

Thanks for your reply.The following content is the output of the above three commands.
1、

linux-vdso.so.1 (0x00007ffec9fc1000)
libhdf5_openmpi.so.103 => /usr/lib/x86_64-linux-gnu/libhdf5_openmpi.so.103 (0x00007f0291bbc000)
libpetsc.so.3.9 => /home/wangbo/petsc-3.9.0/test/lib/libpetsc.so.3.9 (0x00007f028fe92000)
libparmetis.so => /home/wangbo/parmetis/lib/libparmetis.so (0x00007f028fe0e000)
libmetis.so => /home/wangbo/petsc-3.9.0/test/lib/libmetis.so (0x00007f028fd70000)
libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007f028fc21000)
libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f028fbfe000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f028f9e4000)
libmpifort.so.0 => /home/wangbo/petsc-3.9.0/test/lib/libmpifort.so.0 (0x00007f028f92e000)
libmpi.so.0 => /home/wangbo/petsc-3.9.0/test/lib/libmpi.so.0 (0x00007f028f417000)
libgfortran.so.5 => /usr/lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007f028f15b000)
libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f028f140000)
libnetcdf.so.15 => /usr/lib/x86_64-linux-gnu/libnetcdf.so.15 (0x00007f028f01b000)
libudunits2.so.0 => /usr/lib/x86_64-linux-gnu/libudunits2.so.0 (0x00007f028edfe000)
libpython3.8.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (0x00007f028e8a8000)
libvtkFiltersVerdict-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkFiltersVerdict-7.1.so.7.1p (0x00007f028e884000)
libmpi.so.40 => /usr/lib/x86_64-linux-gnu/libmpi.so.40 (0x00007f028e75d000)
libvtkIOParallelXML-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkIOParallelXML-7.1.so.7.1p (0x00007f028e723000)
libvtkParallelMPI-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkParallelMPI-7.1.so.7.1p (0x00007f028e6fa000)
libvtkIOLegacy-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkIOLegacy-7.1.so.7.1p (0x00007f028e630000)
libvtkIOXML-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkIOXML-7.1.so.7.1p (0x00007f028e525000)
libvtkIOCore-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkIOCore-7.1.so.7.1p (0x00007f028e4a8000)
libvtkCommonExecutionModel-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkCommonExecutionModel-7.1.so.7.1p (0x00007f028e3dd000)
libvtkCommonDataModel-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkCommonDataModel-7.1.so.7.1p (0x00007f028e005000)
libvtkCommonCore-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkCommonCore-7.1.so.7.1p (0x00007f028dcb9000)
libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007f028dac7000)
libsz.so.2 => /lib/x86_64-linux-gnu/libsz.so.2 (0x00007f028da8b000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f028da6f000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f028da67000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f028d92a000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0293e55000)
libOpenCL.so.1 => /usr/local/cuda-11.7/lib64/libOpenCL.so.1 (0x00007f028d722000)
libxml2.so.2 => /lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f028d568000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f028d55e000)
libquadmath.so.0 => /lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f028d512000)
libhdf5_serial_hl.so.100 => /lib/x86_64-linux-gnu/libhdf5_serial_hl.so.100 (0x00007f028d4eb000)
libhdf5_serial.so.103 => /lib/x86_64-linux-gnu/libhdf5_serial.so.103 (0x00007f028d16e000)
libcurl-gnutls.so.4 => /lib/x86_64-linux-gnu/libcurl-gnutls.so.4 (0x00007f028d0de000)
libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007f028d0b0000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f028d0ab000)
libvtkverdict-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkverdict-7.1.so.7.1p (0x00007f028d071000)
libopen-rte.so.40 => /lib/x86_64-linux-gnu/libopen-rte.so.40 (0x00007f028cfb7000)
libopen-pal.so.40 => /lib/x86_64-linux-gnu/libopen-pal.so.40 (0x00007f028cf09000)
libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007f028ceb8000)
libvtkParallelCore-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkParallelCore-7.1.so.7.1p (0x00007f028ce5e000)
libvtksys-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtksys-7.1.so.7.1p (0x00007f028ce10000)
libmpi_cxx.so.40 => /lib/x86_64-linux-gnu/libmpi_cxx.so.40 (0x00007f028cdf2000)
libvtkIOXMLParser-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkIOXMLParser-7.1.so.7.1p (0x00007f028cdd2000)
libvtkCommonSystem-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkCommonSystem-7.1.so.7.1p (0x00007f028cdba000)
libvtkCommonMisc-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkCommonMisc-7.1.so.7.1p (0x00007f028cd9c000)
libvtkCommonTransforms-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkCommonTransforms-7.1.so.7.1p (0x00007f028cd66000)
libvtkCommonMath-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkCommonMath-7.1.so.7.1p (0x00007f028cd3f000)
libaec.so.0 => /lib/x86_64-linux-gnu/libaec.so.0 (0x00007f028cd36000)
libxcb.so.1 => /lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f028cd0c000)
libicuuc.so.66 => /lib/x86_64-linux-gnu/libicuuc.so.66 (0x00007f028cb26000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f028cafb000)
libnghttp2.so.14 => /lib/x86_64-linux-gnu/libnghttp2.so.14 (0x00007f028cad2000)
libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x00007f028cab1000)
librtmp.so.1 => /lib/x86_64-linux-gnu/librtmp.so.1 (0x00007f028ca91000)
libssh.so.4 => /lib/x86_64-linux-gnu/libssh.so.4 (0x00007f028ca23000)
libpsl.so.5 => /lib/x86_64-linux-gnu/libpsl.so.5 (0x00007f028ca0e000)
libnettle.so.7 => /lib/x86_64-linux-gnu/libnettle.so.7 (0x00007f028c9d4000)
libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007f028c7fe000)
libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f028c7b1000)
libldap_r-2.4.so.2 => /lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007f028c75b000)
liblber-2.4.so.2 => /lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007f028c74a000)
libbrotlidec.so.1 => /lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x00007f028c73c000)
libevent-2.1.so.7 => /lib/x86_64-linux-gnu/libevent-2.1.so.7 (0x00007f028c6e4000)
libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x00007f028c6df000)
libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007f028c6b2000)
libltdl.so.7 => /lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f028c6a7000)
libXau.so.6 => /lib/x86_64-linux-gnu/libXau.so.6 (0x00007f028c6a1000)
libXdmcp.so.6 => /lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f028c697000)
libicudata.so.66 => /lib/x86_64-linux-gnu/libicudata.so.66 (0x00007f028abd6000)
libunistring.so.2 => /lib/x86_64-linux-gnu/libunistring.so.2 (0x00007f028aa54000)
libhogweed.so.5 => /lib/x86_64-linux-gnu/libhogweed.so.5 (0x00007f028aa1d000)
libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f028a999000)
libcrypto.so.1.1 => /lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007f028a6c1000)
libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f028a58b000)
libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f028a575000)
libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f028a498000)
libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f028a467000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f028a460000)
libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f028a44f000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f028a433000)
libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f028a416000)
libgssapi.so.3 => /lib/x86_64-linux-gnu/libgssapi.so.3 (0x00007f028a3d1000)
libbrotlicommon.so.1 => /lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x00007f028a3ae000)
libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007f028a392000)
libffi.so.7 => /lib/x86_64-linux-gnu/libffi.so.7 (0x00007f028a386000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f028a37f000)
libheimntlm.so.0 => /lib/x86_64-linux-gnu/libheimntlm.so.0 (0x00007f028a373000)
libkrb5.so.26 => /lib/x86_64-linux-gnu/libkrb5.so.26 (0x00007f028a2e0000)
libasn1.so.8 => /lib/x86_64-linux-gnu/libasn1.so.8 (0x00007f028a238000)
libhcrypto.so.4 => /lib/x86_64-linux-gnu/libhcrypto.so.4 (0x00007f028a200000)
libroken.so.18 => /lib/x86_64-linux-gnu/libroken.so.18 (0x00007f028a1e7000)
libwind.so.0 => /lib/x86_64-linux-gnu/libwind.so.0 (0x00007f028a1bd000)
libheimbase.so.1 => /lib/x86_64-linux-gnu/libheimbase.so.1 (0x00007f028a1ab000)
libhx509.so.5 => /lib/x86_64-linux-gnu/libhx509.so.5 (0x00007f028a15b000)
libsqlite3.so.0 => /lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f028a032000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f0289ff7000)

2、
/usr/bin/mpiexec
3、

mpirun - auto mode
     link best version is /usr/bin/mpirun.openmpi
     link currently points to /usr/bin/mpirun.openmpi
     link mpirun is /usr/bin/mpirun
     slave mpiexec is /usr/bin/mpiexec
     slave mpiexec.1.gz is /usr/share/man/man1/mpiexec.1.gz
     slave mpirun.1.gz is /usr/share/man/man1/mpirun.1.gz
/usr/bin/mpirun.lam - priority 30
     slave mpiexec: /usr/bin/mpiexec.lam
     slave mpiexec.1.gz: /usr/share/man/man1/mpiexec.lam.1.gz
     slave mpirun.1.gz: /usr/share/man/man1/mpirun.lam.1.gz
/usr/bin/mpirun.openmpi - priority 50
     slave mpiexec: /usr/bin/mpiexec.openmpi
     slave mpiexec.1.gz: /usr/share/man/man1/mpiexec.openmpi.1.gz
     slave mpirun.1.gz: /usr/share/man/man1/mpirun.openmpi.1.gz

@stephankramer
Copy link
Contributor

So it looks like you've build your own petsc, which in turn has build its own mpi - rather than using your system openmpi and thus your fluidity and flredecomp end up being linked to both. Are you using Ubuntu indeed? In which case you might be better off just using a system (apt) installed petsc. Otherwise either rebuild your petsc specifying that you want to use your system mpi (you may just to have to remove some download-mpi configure option), or otherwise rebuild fluidity using the mpi that was built by petsc by making sure all mpi wrappers in /home/wangbo/petsc-3.9.0/test/bin are in your PATH when configuring+building fluidity, and then also use the mpiexec from that directory.

@xiangbei007
Copy link
Author

Yes, I am using Ubuntu right now. And I have rebuilt fluidity using the mpi that was built by petsc by adding /home/wangbo/petsc-3.9.0/test/bin to my PATH. The output of which mpiexec is /home/wangbo/petsc-3.9.0/test/bin/mpiexec.
But when I run this example again, I got some other errors. The content displayed in flredecomp.err-0 is as follows:

Fatal error in MPI_Allreduce: Invalid datatype, error stack:
MPI_Allreduce(481): MPI_Allreduce(sbuf=0x7ffece2f5f18, rbuf=0x7ffece2f5f20, count=1, INVALID DATATYPE, op=0x46bba8a0, comm=0x84000004) failed
MPI_Allreduce(423): Invalid datatype

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants