Ctest all succeed but 500 report failure

Hello,

I have gotten precice to install but am having issues verifying that the package is working through ctest. Of the 1100+ tests most run to completion but give the following error after there is an attempt to close the communicator. I did a little trouble shooting and found that if I run the tests with mpiexec most would succeed and found that modifying discover_tests.cmake such that if(RANKS STREQUAL “1”) → if(RANKS STREQUAL “0”) all but the following tests succeed. From the error it appears that the solution is also to run with mpiexec but I am not sure how to do this. Is there a better way to resolve this issue?

Please describe your system, and especially:

  • preCICE version 3.2.0
  • Ubuntu 20.04
  • building with CMake
  • Boost 1.74.0, openMPI 4.0.3
        1138 - precice.solverdummy.run.cpp-cpp (Failed)
        1139 - precice.solverdummy.run.c-c (Failed)
        1140 - precice.solverdummy.run.fortran-fortran (Failed)
        1141 - precice.solverdummy.run.cpp-c (Failed)
        1142 - precice.solverdummy.run.cpp-fortran (Failed)
        1143 - precice.solverdummy.run.c-fortran (Failed)
        1151 - precice.tools.check.file (SEGFAULT)
        1152 - precice.tools.check.file+name (SEGFAULT)
        1153 - precice.tools.check.file+name+size (SEGFAULT)

preCICE: Close communication channels
[aero-7x2nx8:3520878] *** Process received signal ***
[aero-7x2nx8:3520878] Signal: Segmentation fault (11)
[aero-7x2nx8:3520878] Signal code: Address not mapped (1)
[aero-7x2nx8:3520878] Failing at address: 0xe8
[aero-7x2nx8:3520878] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7f96e3314090]
[aero-7x2nx8:3520878] [ 1] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_hwloc_base_free_topology+0x27)[0x7f96e2a85037]
[aero-7x2nx8:3520878] [ 2] /lib/x86_64-linux-gnu/libopen-pal.so.40(+0x6fd17)[0x7f96e2a82d17]
[aero-7x2nx8:3520878] [ 3] /lib/x86_64-linux-gnu/libopen-pal.so.40(mca_base_framework_close+0x7c)[0x7f96e2a680fc]
[aero-7x2nx8:3520878] [ 4] [aero-7x2nx8:3520879] *** Process received signal ***
/lib/x86_64-linux-gnu/libopen-pal.so.40(opal_finalize+0x8b)[0x7f96e2a3c19b]
[aero-7x2nx8:3520878] [aero-7x2nx8:3520879] Signal: Segmentation fault (11)
[aero-7x2nx8:3520879] Signal code: Address not mapped (1)
[aero-7x2nx8:3520879] Failing at address: 0xe8
[ 5] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_finalize+0x8bc)[0x7f96e2e4542c]
[aero-7x2nx8:3520878] [ 6] [aero-7x2nx8:3520879] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7fa35ebf7090]
[aero-7x2nx8:3520879] [ 1] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_hwloc_base_free_topology+0x27)[0x7fa35e368037]
[aero-7x2nx8:3520879] [ 2] /precice-3.2.0/libprecice.so.3(+0x8f9b84)[0x7f96e3fdcb84]
[aero-7x2nx8:3520878] [ 7] /lib/x86_64-linux-gnu/libopen-pal.so.40(+0x6fd17)[0x7fa35e365d17]
[aero-7x2nx8:3520879] [ 3] /lib/x86_64-linux-gnu/libopen-pal.so.40(mca_base_framework_close+0x7c)[0x7fa35e34b0fc]
[aero-7x2nx8:3520879] [ 4] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_finalize+0x8b)[0x7fa35e31f19b]
[aero-7x2nx8:3520879] [ 5] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_finalize+0x8bc)[0x7fa35e72842c]
[aero-7x2nx8:3520879] [ 6] /precice-3.2.0/libprecice.so.3(+0x7103ad)[0x7f96e3df33ad]
[aero-7x2nx8:3520878] [ 8] /precice-3.2.0/libprecice.so.3(_ZN7precice11Participant8finalizeEv+0x24)[0x7f96e3ddb308]
[aero-7x2nx8:3520878] /precice-3.2.0/libprecice.so.3(+0x8f9b84)[0x7fa35f8bfb84]
[ 9] [aero-7x2nx8:3520879] /precice-3.2.0/Solverdummies/cpp/solverdummy(+0x4e56)[0x560bf2140e56]
[aero-7x2nx8:3520878] [ 7] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f96e32f5083]
[aero-7x2nx8:3520878] [11] /precice-3.2.0/Solverdummies/cpp/solverdummy(+0x44ee)[0x560bf21404ee]
[aero-7x2nx8:3520878] *** End of error message ***
/precice-3.2.0/libprecice.so.3(+0x7103ad)[0x7fa35f6d63ad]
[aero-7x2nx8:3520879] [ 8] /precice-3.2.0/libprecice.so.3(_ZN7precice11Participant8finalizeEv+0x24)[0x7fa35f6be308]
[aero-7x2nx8:3520879] [ 9] /precice-3.2.0/Solverdummies/cpp/solverdummy(+0x4e56)[0x56037c0b7e56]
[aero-7x2nx8:3520879] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fa35ebd8083]
[aero-7x2nx8:3520879] [11] /precice-3.2.0/Solverdummies/cpp/solverdummy(+0x44ee)[0x56037c0b74ee]
[aero-7x2nx8:3520879] *** End of error message ***
Segmentation fault (core dumped)
CMake Error at /precice-3.2.0/cmake/runsolverdummies.cmake:34 (message):
  An error occurred running the solverdummies! Return code : "139"

To extend the explanation of this error slightly further. From my digging there seems to be either an MPI environment issue when not running with the mpiexec command or an internal preCICE bug that does not handle the case of running outside of an MPI environment which is causing this SEG FAULT

Hi,
The solverdummy tests and the file checking fail, which is a very good indication that the MPI loaded at runtime does not match the MPI compiled against. This is important even if there is no MPI environment.

Compare the libraries detected at runtime against the libraries compiled against:

  • Runtime
    ldd libprecice.so | grep mpi
    
  • Compilation
    grep MPI CMakeCache.txt
    

Here is the results of running those commands

>$ ldd libprecice.so | grep mpi 
libmpi_cxx.so.40 => /lib/x86_64-linux-gnu/libmpi_cxx.so.40 (0x00007fce8b60e000)     
libmpi.so.40 => /lib/x86_64-linux-gnu/libmpi.so.40 (0x00007fce8b4e9000)
>$ grep MPI CMakeCache.txt

CMAKE_CXX_COMPILER:UNINITIALIZED=g++
CMAKE_CXX_COMPILER_AR:FILEPATH=/usr/bin/gcc-ar-9
CMAKE_CXX_COMPILER_RANLIB:FILEPATH=/usr/bin/gcc-ranlib-9
CMAKE_C_COMPILER:FILEPATH=/usr/bin/cc
CMAKE_EXPORT_COMPILE_COMMANDS:BOOL=
CMAKE_Fortran_COMPILER:FILEPATH=/usr/bin/f95
//Executable for running MPI programs.
MPIEXEC_EXECUTABLE:FILEPATH=/usr/bin/mpiexec
//Maximum number of processors available to run MPI applications.
MPIEXEC_MAX_NUMPROCS:STRING=8
//Flag used by MPI to specify the number of processes for mpiexec;
MPIEXEC_NUMPROC_FLAG:STRING=-n
MPIEXEC_POSTFLAGS:STRING=
MPIEXEC_PREFLAGS:STRING=
//MPI CXX additional include directories
MPI_CXX_ADDITIONAL_INCLUDE_DIRS:STRING=
MPI_CXX_COMPILER:UNINITIALIZED=mpic++
//MPI CXX compiler wrapper include directories
MPI_CXX_COMPILER_INCLUDE_DIRS:STRING=/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi;/usr/lib/x86_64-linux-gnu/openmpi/include
//MPI CXX compilation definitions
MPI_CXX_COMPILE_DEFINITIONS:STRING=
//MPI CXX compilation options
MPI_CXX_COMPILE_OPTIONS:STRING=-pthread
MPI_CXX_HEADER_DIR:PATH=/usr/lib/x86_64-linux-gnu/openmpi/include
//MPI CXX libraries to link against
MPI_CXX_LIB_NAMES:STRING=mpi_cxx;mpi
//MPI CXX linker flags
MPI_CXX_LINK_FLAGS:STRING=-pthread
//If true, the MPI-2 C++ bindings are disabled using definitions.
MPI_CXX_SKIP_MPICXX:BOOL=OFF
//Location of the mpi library for MPI
MPI_mpi_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so
//Location of the mpi_cxx library for MPI
MPI_mpi_cxx_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so
PRECICE_CTEST_MPI_FLAGS:STRING=
//Enables MPI-based communication and running coupling tests.
PRECICE_FEATURE_MPI_COMMUNICATION:BOOL=ON
//ADVANCED property for variable: CMAKE_CXX_COMPILER
CMAKE_CXX_COMPILER-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CXX_COMPILER_AR
CMAKE_CXX_COMPILER_AR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CXX_COMPILER_RANLIB
CMAKE_CXX_COMPILER_RANLIB-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_COMPILER
CMAKE_C_COMPILER-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_EXPORT_COMPILE_COMMANDS
CMAKE_EXPORT_COMPILE_COMMANDS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_Fortran_COMPILER
CMAKE_Fortran_COMPILER-ADVANCED:INTERNAL=1
COMPILER_HAS_DEPRECATED:INTERNAL=1
//Test COMPILER_HAS_DEPRECATED_ATTR
COMPILER_HAS_DEPRECATED_ATTR:INTERNAL=1
//Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
COMPILER_HAS_HIDDEN_INLINE_VISIBILITY:INTERNAL=1
//Test COMPILER_HAS_HIDDEN_VISIBILITY
COMPILER_HAS_HIDDEN_INLINE_VISIBILITY:INTERNAL=1
//Test COMPILER_HAS_HIDDEN_VISIBILITY
COMPILER_HAS_HIDDEN_VISIBILITY:INTERNAL=1
//Details about finding MPI
FIND_PACKAGE_MESSAGE_DETAILS_MPI:INTERNAL=[TRUE][c ][v3.1()]
//Details about finding MPI_CXX
FIND_PACKAGE_MESSAGE_DETAILS_MPI_CXX:INTERNAL=[/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so][/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so][mpi_cxx;mpi][/usr/lib/x86_64-linux-gnu/openmpi/include][TRUE][v3.1()]
//ADVANCED property for variable: MPIEXEC_EXECUTABLE
MPIEXEC_EXECUTABLE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPIEXEC_MAX_NUMPROCS
MPIEXEC_MAX_NUMPROCS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPIEXEC_NUMPROC_FLAG
MPIEXEC_NUMPROC_FLAG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPIEXEC_POSTFLAGS
MPIEXEC_POSTFLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPIEXEC_PREFLAGS
MPIEXEC_PREFLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPI_CXX_ADDITIONAL_INCLUDE_DIRS
MPI_CXX_ADDITIONAL_INCLUDE_DIRS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPI_CXX_COMPILER
MPI_CXX_COMPILER-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPI_CXX_COMPILER_INCLUDE_DIRS
MPI_CXX_COMPILER_INCLUDE_DIRS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPI_CXX_COMPILE_DEFINITIONS
MPI_CXX_COMPILE_DEFINITIONS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPI_CXX_COMPILE_OPTIONS
MPI_CXX_COMPILE_OPTIONS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPI_CXX_HEADER_DIR
MPI_CXX_HEADER_DIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPI_CXX_LIB_NAMES
MPI_CXX_LIB_NAMES-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPI_CXX_LINK_FLAGS
MPI_CXX_LINK_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPI_CXX_SKIP_MPICXX
MPI_CXX_SKIP_MPICXX-ADVANCED:INTERNAL=1
//Result of TRY_COMPILE
MPI_RESULT_CXX_libver_mpi_normal:INTERNAL=TRUE
//Result of TRY_COMPILE
MPI_RESULT_CXX_test_mpi_MPICXX:INTERNAL=TRUE
//Result of TRY_COMPILE
MPI_RESULT_CXX_test_mpi_normal:INTERNAL=TRUE
MPI_RUN_RESULT_CXX_libver_mpi_normal:INTERNAL=0
//ADVANCED property for variable: MPI_mpi_LIBRARY
MPI_mpi_LIBRARY-ADVANCED:INTERNAL=1
//ADVANCED property for variable: MPI_mpi_cxx_LIBRARY
MPI_mpi_cxx_LIBRARY-ADVANCED:INTERNAL=1
//Result of TRY_COMPILE

Precice is compiled against openmpi which is obvious in the cmake when I look at ldd libprecice.so it is at first glance different but I followed the link and the results are below

>$ ldd libprecice.so | grep mpi 
libmpi_cxx.so.40 => /lib/x86_64-linux-gnu/libmpi_cxx.so.40 (0x00007fce8b60e000)     
libmpi.so.40 => /lib/x86_64-linux-gnu/libmpi.so.40 (0x00007fce8b4e9000)

>$ ls -l /lib/x86_64-linux-gnu/libmpi_cxx.so.40
/lib/x86_64-linux-gnu/libmpi_cxx.so.40 -> libmpi_cxx.so.40.20.1

>$ ls -l /lib/x86_64-linux-gnu/libmpi_cxx.so.40.20.1
/lib/x86_64-linux-gnu/libmpi_cxx.so.40.20.1 -> openmpi/lib/libmpi_cxx.so.40.20.1

Now looking at where cmake shows mpi to be at /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so

>$ ls -l /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so
/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so -> libmpi_cxx.so.40.20.1

It appears to me that the two libraries are consistent