Segfault at precicef_create()

Recently ported a Fortran adapter I wrote from 2.x to 3.x. On our clusters, I’m consistently running into this issue. What could be going wrong here?

6565 ==== backtrace (tid:  89832) ====
6566  0  /usr/lib64/libucs.so.0(ucs_handle_error+0x2dc) [0x15553aa2a9ac]
6567  1  /usr/lib64/libucs.so.0(+0x2bb8c) [0x15553aa2ab8c]
6568  2  /usr/lib64/libucs.so.0(+0x2bd5a) [0x15553aa2ad5a]
6569  3  /opt/toss/openmpi/4.0/intel/lib/libmpi.so.40(MPI_Comm_rank+0x51) [0x15555300ee81]
6570  4  /gpfs/edrobe/Tools/precice/precice-3.1.2-build/lib64/libprecice.so.3(+0xf39c79) [0x15554f6ecc79]
6571  5  /gpfs/edrobe/Tools/precice/precice-3.1.2-build/lib64/libprecice.so.3(+0xe33b40) [0x15554f5e6b40]
6572  6  /gpfs/edrobe/Tools/precice/precice-3.1.2-build/lib64/libprecice.so.3(+0xe16e8b) [0x15554f5c9e8b]
6573  7  /gpfs/edrobe/Tools/precice/precice-3.1.2-build/lib64/libprecice.so.3(_ZN7precice11ParticipantC1ENS_4spanIKcLm184467440737095516     15EEES3_ii+0x17c) [0x15554f5842ac]
6574  8  /gpfs/edrobe/Tools/precice/precice-3.1.2-build/lib64/libprecice.so.3(precicef_create_+0x14b) [0x15554f77252f] ```

How are you calling precicef_create(), exactly?

Think I took this from the fortran bindings example:

call precicef_create(participant_name,precice_xml,id,commsize,50,50)

Where participant_name and precice_xml are chars of length 50. id and commsize are taken from an MPI_comm_size() call

I understand you are using the fortran module, following this example:

Does the example actually run on your system?

And did you also try the intrinsic Fortran bindings (with this example)?

Ah, yes - running that solver dummy (the fortran module example one) gives me the same issue:

[ec36:210612:0:210612] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x440000e8)
==== backtrace (tid: 210612) ====
 0  /lib64/libucs.so.0(ucs_handle_error+0x2dc) [0x15553cb239ac]
 1  /lib64/libucs.so.0(+0x2bb8c) [0x15553cb23b8c]
 2  /lib64/libucs.so.0(+0x2bd5a) [0x15553cb23d5a]
 3  /opt/toss/openmpi/4.1/intel/lib/libmpi.so.40(MPI_Comm_rank+0x51) [0x155552ba5581]
 4  /gpfs/edrobe/Tools/precice/precice-3.1.2-build/lib64/libprecice.so.3(+0xf39c79) [0x155554c8cc79]
 5  /gpfs/edrobe/Tools/precice/precice-3.1.2-build/lib64/libprecice.so.3(+0xe33b40) [0x155554b86b40]
 6  /gpfs/edrobe/Tools/precice/precice-3.1.2-build/lib64/libprecice.so.3(+0xe16e8b) [0x155554b69e8b]
 7  /gpfs/edrobe/Tools/precice/precice-3.1.2-build/lib64/libprecice.so.3(_ZN7precice11ParticipantC1ENS_4spanIKcLm18446744073709551615EEES3_ii+0x17c) [0x155554b242ac]
 8  /gpfs/edrobe/Tools/precice/precice-3.1.2-build/lib64/libprecice.so.3(precicef_create_+0x14b) [0x155554d1252f]
 9  ./solverdummy() [0x404f05]
10  ./solverdummy() [0x404aa2]
11  /lib64/libc.so.6(__libc_start_main+0xe5) [0x155553426d85]
12  ./solverdummy() [0x4049ae]
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
solverdummy        00000000004079BA  Unknown               Unknown  Unknown
libpthread-2.28.s  00001555537C3CF0  Unknown               Unknown  Unknown
libmpi.so.40.30.6  0000155552BA5581  MPI_Comm_rank         Unknown  Unknown
libprecice.so.3.1  0000155554C8CC79  Unknown               Unknown  Unknown
libprecice.so.3.1  0000155554B86B40  Unknown               Unknown  Unknown
libprecice.so.3.1  0000155554B69E8B  Unknown               Unknown  Unknown
libprecice.so.3.1  0000155554B242AC  _ZN7precice11Part     Unknown  Unknown
libprecice.so.3.1  0000155554D1252F  precicef_create_      Unknown  Unknown
solverdummy        0000000000404F05  Unknown               Unknown  Unknown
solverdummy        0000000000404AA2  Unknown               Unknown  Unknown
libc-2.28.so       0000155553426D85  __libc_start_main     Unknown  Unknown
solverdummy        00000000004049AE  Unknown               Unknown  Unknown

I believe something similar happens when trying the intrinsic bindings example

Can you try that as well, please?
And I assume that it worked for v2, so it is really specific to the upgrade from v2->v3. Nothing else (e.g., dependency versions) changed in the meantime, correct?

Yes, sorrry - I meant to say that I did try the non-module technique, and it appears that I get a similar trace.

The only dependency that changed I believe changed is Boost. I’m using the Boost installed by the sysadmins (1.73) versus the 1.8.0 I was using with precice-2.x. Not sure if that matters here? I assumed that 1.7.3 was okay according to cmake.

fortran_participant.log (2.9 KB)
cpp_participant.log (2.7 KB)

Relevant log files from attempting to run the cpp and fortran (intrinsic version) solver dummies together if it helps.

I have no clue… I want to say that this is specific to the MPI+system setup, but it does bug me that you say you only observed this with v3 and not with v2 on the same exact system. Just in case: could you please try again with preCICE v2, to ensure that this is the only difference?

Hi, according to the stack traces precice crashes in the Participant constructor. Further up the stack is MPI_comm_rank.

This could mean that the MPI initialization didn’t succeed without crashing.
It could also mean that preCICE triggers an assertion before MPI is initialized, which attempts to display the rank which may then lead to the observed failure.

There is a lot of guesswork though. For useful stack traces, we need a debug build, potentially even enabling the back trace library in the cmake configuration.

Before that could you please have a look if precice-tools version and precice-tools check work without crashing?

Also to make sure the Open MPI installation in /opt didn’t break in some way due to a system update, could you please try to run an MPI example code in parallel?