A case can run normally on the local machine, but stuck on the cluster during the precice initialization phase

Rachel · November 5, 2024, 8:05am

Dear all,

I am simulating the problem of elastic structures entering into water. This is a similar blog. I am still very confused about this issue. https://precice.discourse.group/t/running-the-case-on-the-cluster-without-errors-but-hanging-for-a-long-time-in-the-first-step/2150

The case can run smoothly on the local machine, but when submit it on the cluster, it will stuck during the precice initialization phase for a very long time. The software versions I am using are OpenFoam 1912, Calculix 2.20, and Precice 2.3.0. The software configuration is exactly the same for local and cluster, the difference is that local uses ubuntu 20.04 system and cluster uses centos7 system, in addition, I use slurm for job submission.

It should be noted that I have run some small test cases on the cluster and they can start simulation quite normally, which means that there should be no problems with the software configuration of the cluster. One node on the cluster has 128 cores, so I don’t need to deal with cross-node communication and double-allocation of mpi as I only use one node for a time. And I don’t use a hostfile to allocate compute resources. In addition, since my case does not involve cross node communication issues, I still use the Lo loopback interface in the precice configuration of the network. If there are any issues with what I am doing here, please help point them out.

The log file shows that calculix can establish communication very quickly, but openfoam is always stuck at

–[precice] 0m Compute “write” mapping from mesh “Fluid-Mesh-Centers” to mesh “Solid-Mesh”.

The coupling interface meshes in the present work are quite large, perhaps in the tens of thousands. But precice is capable of handling massive parallelism, isn’t it? I’m using rbf mapping with a support radius.

Thank you for reading my wall of text. The relevant documents are listed below and any thoughts or comments are appreciated.
runFluid.txt (258 Bytes)
runSolid.txt (100 Bytes)
slurm.txt (267 Bytes)
FluidLog.txt (3.8 KB)
SolidLog.txt (7.1 KB)

Rachel · November 5, 2024, 8:50am

If I change the mapping to nearest-neighbor, the case on the cluster can also be run very quickly. I think it’s caused by the number of coupled grids being too large, but I’m puzzled as to why I don’t have this problem locally.

Ray_Scarr · November 8, 2024, 5:30pm

I would start with:

Use a different network interface anyway. Presumably Infiniband is available since you have a cluster.
Double-check that the solver executables on the cluster link to the libraries you expect.

Rachel · November 9, 2024, 2:47am

Thanks very much for your reminder.
While doing a check on the installation I realized that I may have accidentally turned off some feature of precice. Because I used the following command when building the release version of precice.

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=~/software/precice -DPRECICE_PETScMapping=OFF -DPRECICE_PythonActions=OFF …

This turns off the RBF mapping function based on MPI-parallel. The result is that the RBF mapping is serial and therefore hangs indefinitely after a large amount of coupled meshes. I’ll try again to remove the extra parameter definitions and rebuild precice and I think things will get better.

Rachel · November 10, 2024, 9:25am

Dear all,
I recompiled precice and configured it using

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=~/software/precice …

And the network=“ib0” has been defined in the precice-config.xml. But when i run the case. CalculiX runs well but OpenFOAM still stuck at

Iprecice] [0m Using tree-based preallocation for matrix C
[precicel [8m Using tree-based preallocation for matrix A

I am using rbf mapping based on PETSc with a support radius of 0.05. I can easily start simulating with nearest neighbor, but it seems to diverge easily.

I’m completely lost. My coupling interface has tens of thousands of nodes and it can run without problems at local machine with a Ubuntu2004 system. Can anyone give me some advice? I would greatly appreciate it.

Rachel · November 12, 2024, 2:32am

Dear all,
I use ctest to check my precice installation. There are some tests failed. Will this affect the initialization speed？

83% tests passed, 5 tests failed out of 29
Label Time Summary:
Solverdummy = 4.01 sec
mpiports = 40.12 sec
Total Test time (real) = 192.56 sec
The following tests FAILED:
1 - precice.acceleration (Timeout)
4 - precice.com.mpiports (Timeout)
8 - precice.m2n.mpiports (Timeout)
15 - precice.serial (Timeout)
16 - precice.parallel (Timeout)
Errors while running CTest
I don’t have much experience with precice, especially in cluster operation. Please help me.

Makis · March 23, 2025, 3:35pm

Dear @Rachel,

I would suggest upgrading to the latest preCICE v3 (v3.2 is coming up soon), which includes a new implementation of the RBF mapping based on the partition-of-unity approach. This should be much faster.

In any case, even when using the older implementations, restricting the support radius would help.

It is indeed confusing that locally you don’t get this issue. I think what should be happening here is that preCICE automatically switches the implementation for the solver of the RBF system, and somehow the choice is bad for the particular system and set of dependencies.

Please try with v3, it would be interesting if this is still an issue.

Topic		Replies	Views
Running the case on the cluster without errors, but hanging for a long time in the first step Using preCICE openfoam , calculix , fsi	5	74	October 12, 2024
Running preCICE on a Cluster: The linear system of the RBF mapping from mesh Solid-Mesh to mesh Fluid-Mesh has not converged Using preCICE openfoam , adapters , data-mapping , parallel	11	1095	September 23, 2021
Data mapping too long (FSI case:On the interface, there are many fluid meshes and few solid ones) Using preCICE openfoam , data-mapping , calculix , fsi	8	336	October 18, 2023
Test fails for mapping RBF during installation Installing preCICE mpi , tests	4	979	April 9, 2020
Error when running the Quickstart case Official adapters and tutorials openfoam	3	410	July 8, 2022

A case can run normally on the local machine, but stuck on the cluster during the precice initialization phase

Related topics