I have compiled and installed preCICE 3.2.0 on the headnode of our university cluster (Rocky 8 distrib) from the source (I don’t want to use spack). The following deps were installed:
boost 1.85
eigen 3.4.0
petsc 3.23.2 with Intel MPI (Version 2021.12 Build 20240410)
gcc 13.2.0 is set as compiler (Intel Compiler 2024 combined with the standard gcc of the distrib 8.5 is not supporting c++17).
intel-oneapi-mpi 2021.12.1 Version 2021.12 Build 20240410 is set for MPI.
The compilation and installation of preCICE 3.2.0 is working. ‘make test_install’ is also working well.
However, the ‘ctest’ test suite is generating a lot of the following MPI init errors:
Abort(1090319) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Unknown error class
MPIR_Init_thread(192):
MPID_Init(1538)......:
MPIR_pmi_init(169)...: PMI2_Job_GetId returned 14
This kind of error occurs for example with testprecice --run_test=XML/AttributeConcatenation --log_level=message
The error disappears by adding “mpirun” or “mpiexec” in front of the test. It seems that Intel MPI Version 2021.12 Build 20240410 needs its wrapper “mpirun” / “mpiexec” to correctly initialize its MPI environment.
When I look into the Ctest log (here a part of the log LastTest_cut.log (529.8 KB)), I see that all “parallel MPI” tests configured to use “mpiexec” passed. However, some tests (serial?) are not invoking “mpiexec” in front of “testprecice”. These tests failed with the “Fatal error in PMPI_Init”. By adding “mpiexec -np 1” or “mpirun -np 1” the tests pass.
On another cluster, preCICE is compiled with an older Intel MPI (Version 2021.6 Build 20220227) and this issue does not occur. I can start the “serial” tests without wrapper.
Is this a typical behaviour of the “new” intel MPI? Is there a way to solve this issue in Ctest (It is difficult to see the real problems in the Ctest log)?
I am not aware whether this is a new behavior of MPI, but it might be that the tests were so far relying on some assumption that was so far working fine.
Just to clarify the severity of the issue: do your parallel preCICE-based simulations run fine, despite the failing tests?
-- MPI Version: Intel(R) MPI Library 2021.14 for Linux* OS
This version is in between of the two working ones 2021.6 (your other cluster) and 2021.14 (the current CI). So either there was a temporary problem with the release, or there is some other issue.
Do you have additional environment variables set that influence Intel MPI? These could be an issue.
I did not test my FSI cases with preCICE on the University cluster until now. I was installing preCICE on that cluster to start with them. But before starting coupled FSI simulations, I wanted to check if the install of preCICE is ok. with the ctest suite
unfortunately the intel MPI is only available through spack on our cluster university:
spack info intel-oneapi-mpi:
2021.12.1
2021.12.0
2021.11.0
2021.10.0
2021.9.0
2021.8.0
2021.7.1
2021.7.0
2021.6.0
2021.5.1
2021.5.0
2021.4.0
2021.3.0
2021.2.0
2021.1.1
As suggested by the admin of our university cluster, I will try the different version of intel MPI between 2021.12.1 and 2021.6.0 to investigate if it is only in one version or in several. Or if it is an issue with one of the spack configuration options.
it seems that the MPI init error that I get by starting “testprecice” without MPI wrapper, come from the environement and not from the MPI version:
on our university cluster, the variable “I_MPI_PMI_LIBRARY” is set to “/usr/lib64/libpmi2.so” per default.
on a second cluster, the variable “I_MPI_PMI_LIBRARY” is not set and I don’t get this MPI init error.
When I unset the variable “I_MPI_PMI_LIBRARY” on our university cluster, I can start “testprecice” without MPI wrapper and I don’t get the MPI init error:
This test suite runs on rank 0 of 1
Running 1 test case...
Setup up logging
Test context of XML/AttributeConcatenation represents "Unnamed" and runs on rank 0 out of 1.
Test case XML/AttributeConcatenation did not check any assertions
*** No errors detected