Error “SIGABRT” from libc.so.6

I am not sure if you have any suggestion on the following error “SIGABRT” from libc.so.6 (std::bad_alloc when call precicef_initialize(deltatCosim))
(gdb in emacs).

  1. This program worked well when we compiled using gcc version 4.

  2. I used spack to install precice for gcc version 4.

  3. Because our solver’s default compiler is gcc version 4.
    (I know the minimum version required for precice is gcc version 5, but version 4 did work for us for development purpose although some compilation warnings)

  4. The coupling worked out of the spack box between our solver and precice.

  5. Now, since our solver are changing to gcc version 11 as our default. This error shows up.

  6. I had a difficult time to compile/link precice installed by spack this time using gcc version 11.

  7. The following installation and LD_LIBRARY_PATH addition finally work for me. (I didn’t need to do them for gcc version 4)
    spack install precice ^boost@1.74.0 ^petsc@3.17
    export LD_LIBRARY_PATH=boost-1.74.0/lib:$LD_LIBRARY_PATH
    export LD_LIBRARY_PATH=openmpi-4.1.5/lib:$LD_LIBRARY_PATH
    export LD_LIBRARY_PATH=petsc-3.17.5/lib:$LD_LIBRARY_PATH

  8. This time, when I run the model (which worked before for gcc4), I get this error. (I attached a picture to show what the model looks like at the end).

A little thing: the default petsc is 3.19. Linking 3.19 petsc with our solver has no issue. But runtime, the executable errors out for not finding 3.17 petsc. That is why 3.17 is forced to install here.

Hi!

std::bad_alloc means that your executable cannot allocate memory due to a failing call to the underlying allocator function. A common cause are that the system is out of memory.
Given that this happens in initialize, my best guess is that you are using a radial basis function mapping in conjunction with a large mesh. The system matrices can quickly explode in size.
Another common memory hog is QN-based acceleration, but this only shows after some time, not in intialization.

Another common problem is a violation of the one definition rule (ODR), which boils down mixing different versions of the same dependencies providing the same symbols. If you are working with shared libraries, then this can be highly confusing to debug as the failing code parts may change depending on which version is loaded first. We often see this issue with differing Eigen versions. In case of spack, this applies to the entire dependency graph of preCICE.

Based on what you showed us, my gut feeling tells me that we are dealing with the latter.
The provided package specification doesn’t fix the compiler, hence I suspect that spack uses another compiler for preCICE (and its dependencies)
You can specify the compiler using the following syntax

spack install precice%gcc@11 ^boost@1.74.0 ^petsc@3.17

Please have a look at the spack documentation on how to find your compiler in case it is missing.

Also, loading dependencies from spack is usually done like this:

spack load precice

This ensures that the correct dependencies of the precice package are loaded too.

1 Like

This is what I understand:

  • Your Fortran-based code COSIM (cosim.F, shown on the right terminal pane) used to compile with GCC 4, but now it compiles with GCC 11. I assume that the uncoupled code also runs successfully.
  • Now, when you run a simulation, you get an unclear abort signal from libc.
  • You also get an std::bad_alloc from the other participant while setting up the preCICE connection.
  • You built preCICE with Spack

Questions:

  1. Do the preCICE solver dummies run? Or do you get the same error?
  2. Did you remove the old preCICE before rebuilding it?

Could you provide more details?

This sounds interesting: Maybe the solver is linking to the wrong PETSc at runtime.

Do you actually need PETSc? If not, I would try first turning it off with running spack with -petsc: spack install precice ^boost@1.74.0 -petsc. This could already simplify the situation. We only need PETSc for an optional feature that is mainly needed in massively parallel simulations.

1 Like

Hi, @fsimonis ,
According to your descriptions, I fee it might be due to the ODR (I don’t know anything about it) because the model has only hundreds nodes run on a server (memory shouldn’t be a problem).

  1. both our solver and precice(via spack) are compiled by gcc11.
    spack/opt/spack/linux-centos8-skylake_avx512/gcc-11.2.1/

  2. when I load precice, I have now (1st one is petsc@3.17; 2nd one is petsc@3.19):
    Matching packages:
    vcohyfs precice@2.5.0%gcc@=11.2.1 arch=linux-centos8-skylake_avx512
    jt2derp precice@2.5.0%gcc@=11.2.1 arch=linux-centos8-skylake_avx512

The 1st one works for me. And “spack load /vcohyfs” still gave me the same error.

What I did to the old gcc4 spack is:
rm -rf spack
rm -rf ~/.spack
The I reinstalled “git clone -b develop GitHub - spack/spack: A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
and “spack install precice ^boost@1.74.0 ^petsc@3.17”

I am not sure if the gcc4 has any hidden thing left.

Is there a safe way to redo everything from fresh (because the gcc4 precice once worked for me just out of box perfectly)?

Thanks

Hi, @Makis ,
Yes, the uncoupled gcc11 codes work fine. (Our own solver works fine. Precice solverdummy.f90 works fine.)

What I did to the old gcc4 spack is:
rm -rf spack
rm -rf ~/.spack
The I reinstalled as “git clone -b develop https://github.com/spack/spack.git”
and “spack install precice ^boost@1.74.0 ^petsc@3.17”

The hard time is solved by what I mentioned:
spack install precice ^boost@1.74.0 ^petsc@3.17
export LD_LIBRARY_PATH=boost-1.74.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=openmpi-4.1.5/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=petsc-3.17.5/lib:$LD_LIBRARY_PATH

For gcc4, once I did “spack install precice ^boost@1.74.0”, I didn’t need to choose another petsc and I didn’t need to manually add LD_LIBRARY_PATH. I guess that “. spack/share/spack/setup-env.sh” or “spack load precice” can set up the correct env and when building our solver, it could find headers/libs and link them correctly.

Our solver doesn’t need petsc. I don’t know if precice needs it or not.

This is the 1st time for me to work with spack and precice. If I could remove them and restart everything, I would be happy to do so since I don’t have the root privilege and everything is developed under my own linux account.

Thanks

If preCICE is compiled with PETSc and you configure RBF mapping, then it uses a GMRES solver, which can make a performance difference in very large simulations.

If it is compiled without PETSc and you configure RBF mapping, then it uses an Eigen-based QR decomposition, which can only be parallelized across a single compute node.

In most cases, preCICE users don’t need PETSc.

And how about this question?

If they are run separately: Our own solver works fine. Precice solverdummy.f90 works fine.
gfortran -I./header -L./libs -lprecice solverdummy.f90 -o solverdummy

image

Ideally, we try to support any interface to preCICE and leave the options to users. So if preCICE has the option, we won’t block from our solver. So if in cases, preCICE needs petsc, we try to still comply with that.

by using strace, the problem appears to come from MPI. Please see the following picture.
Can we install precice linked with the Intel MPI?
Thanks

According to your description, you are still mixing dependencies. The above error occurs due to mixing Intel MPI and OpenMPI.
Extending LD_LIBRARY_PATH, does not change includes and compiler wrappers, so you are most likely compiling against one version and running against another one.

I recommend disabling PETSc and sticking to the default MPI (likely OpenMPI) until your basic setup works. Afterwards, you can enable PETSc and use your Intel MPI step by step.

1) Basic setup

  1. Build precice using spack
    spack install precice ^boost@1.74.0 -petsc
  2. Load preCICE and it’s dependencies. This will also setup the MPI version used by spack to build precice.
    spack load precice
  3. In the same shell, move to the directory of your solver sources
    cd path/to/your/solver
  4. Remove all build files and configurations of your Solver
  5. Configure your build using ./configure or cmake if applicable.
    Check the build files if they contain references to the spack packages.
  6. Build your solver
  7. Run some tests

Very important: You need to setup the environment of a shell using spack load precice before you run or rebuild the solver. This includes jobscripts in case you run it on a cluster.

2) Enabling PETSc

Once this setup works, I recommend to build preCICE with PETSc by dropping the -petsc from the installation command. Then you need to load and rebuild your solver from scratch.

3) Enabling Intel MPI

To get Intel MPI to work, I recommend you to use the intel-mpi package first:

spack install precice ^boost@1.74.0 ^intel-mpi

Recent versions of spack advise using the intel-oneapi-mpi package instead.

Then again load and rebuild the solver from scratch.

4) Using local packages

If all of this works and only if required, then I would attempt to integrate spack with the locally provided packages by manually searching and configuring packages. For this last step, your local system admin and the spack documentation will be the right contacts.

Hope that helps!

1 Like

@fsimonis , thanks a lot for your reply.
I used “spack install precice ~~mpi ~~petsc ^boost@1.74.0” to install precice last week.

As you can see, I have 2 version of precice in the following picture. And the /3vhpcl4 version without mpi and petsc is the one I tried to use. After “spack load precice” and building with our solver, it shows the error of “(Cannot allocate memory)” in the highlighted region in the 2nd picture. Do you have any clue on the reason?

I will try to clean the build then build as you suggested “4. Remove all build files and configurations of your Solver”. I will let you know the result.

BTW, what is “configurations of your Solver” or " Configure your build using ./configure"?

After I “make clean”, the same error still happens. Do you know if other softwares like calculix or openfoam has the similar runtime issue when compiled using gcc11?

can I use “spack install precice ++debug ~~mpi ~~petsc ^boost@1.74.0” to install a debug version of precice?

According to these logs, the crash occurs when Setting up primary communication to coupling partners, which is the first time <m2n> are established.
I am pretty sure/hope you are using the default <m2n:sockets>, in which case this establishes connections using Boost.asio.

As you are requesting spack to build preCICE with boost version 1.74.0, I assume (correct me if I am wrong) that you try to make it compatible with the system boost used by the solver.
If this assumption is correct, then you are again mixing dependencies which would explain strange behaviour as this.

This part is key. You need to make sure that the solver prefer using spack packages over system packages. This essentially boils down to checking if paths are hardcoded in the build system you are using for the solver.

As mentioned above, depending on your build system, this will not change dependencies.
Do you need to run some configuration steps prior to running make, such as ./configure or cmake?

You can, but I don’t think it will help a lot in this case.

You can do the following to detect conflicting versions:

  • Run spack load precice to load the dependencies
  • Find the installation location spack location --install-dir precice
  • In that directory, run ldd lib/libprecice.so, these are all resolved libraries for precice and its dependencies.
  • Then run ldd yourSolverExecutable to see what the dependency resolution of the solver looks like
  • Finally compare the two to detect conflicts

@fsimonis , first, thanks a lot for your help.

Yes, I am using <m2n:sockets from=“OptiStruct” to=“Fluid” />

By comparing ldd results on libprecice.so.2.5.0 and our solver executable, I found that there is only one difference on one library:

libprecice.so.2.5.0: libz.so.1 => /home/hongwu/spack/opt/spack/linux-centos8-skylake_avx512/gcc-11.2.1/zlib-1.2.13-oo3whxvef2zhbc7w2f5p4j6lbiqr5lae/lib/libz.so.1 (0x00007f8aff5d4000)

our solver executable: libz.so.1 => /lib64/libz.so.1 (0x00007f27dcfa0000)

Other than that, our solver executable libraries contain the same libprecice.so.2.5.0 libraries. And libprecice.so.2.5.0 libraries are a subset of our solver executable libraries.

It seems that “spack install precice ++debug ~~mpi ~~petsc ^boost@1.74.0” doesn’t work.

Hi,

Sorry, please use build_type=Debug instead of +debug.

While this doesn’t look tragic, it still means that something is going wrong.
At this point, I am out of ideas and would have to inspect your solver code to continue giving support. I strongly recommend getting help from your cluster admins and the code owners of your solver.

Assistance under NDA is available if your company holds an extended support licence:

I manually built all dependence libraries and PRECICE according to your website. It was successful. I include my cmake command as below. Now, I can debug into precice code and saw the error was thrown out at SolverInterfaceImpl.cpp line 269: PRECICE_ASSERT(not _couplingScheme->isInitialized()); It seems to be related to couplingScheme. Do you have any suggestion? Thanks again.

cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=~/software/precice -DPRECICE_PETScMapping=OFF -DPRECICE_PythonActions=OFF -DLIBXML2_LIBRARY=“/home/hongwu/libxml2-v2.9.12/lib/libxml2.so” -DLIBXML2_INCLUDE_DIR=“/home/hongwu/libxml2-v2.9.12/include” -DPRECICE_MPICommunication=OFF …

this issue solved and is caused by inconsistent boost libs which are used by both precice and us.

2 Likes