MPI runs: Sharing all ranks between two solvers

Guillaume · October 29, 2024, 1:37pm

Hello everyone,

I am using preCICE to couple 2 fluid solvers using a serial explicit coupling scheme: when one solver performs its iterations, the other one is at idle. Eventually, I want to run massively parallel simulations using MPI where both solvers share all of the allocated ranks. With a serial explicit scheme, all the available computational ressources would thus be allocated to the active solver (provided there is enough memory to store the data of each solver), so that the ranks are not at idle 50% of the time.

My question is the following : When a solver completes its time step and calls advance(dt), then waits for the other solver to complete its own time step, what is the load of the procs allocated to the now waiting solver? Can it be close to zero, allowing an almost complete transfer of computational ressources towards the active solver? If the answer is yes, how can I achieve this?

When I run parallel simulations on my local machine with shared ranks (e.g., 8 procs allocated in common to both solvers), each time step takes considerably longer than when the procs are allocated to one solver only.

Thank you for your help,

Guillaume

fsimonis · October 29, 2024, 4:34pm

Hi Guillaume,

Have you attempted a parallel coupling scheme? This runs both solvers simultaneously, requiring only the faster to wait for the slower. You can then redistribute resources to minimize the idle time.
If you are able to use it often depends on many factors though.

If you use the `m2n:sockets’, then preCICE uses a mix of blocking and non-blocking communication methods. The blocking communication uses a pretty efficient event loop.
On the Linux workstations and clusters I used so far, CPU usage normally doesn’t reach the 1% mark while waiting.

So, you should be good to go.

Best Frédéric

Guillaume · October 31, 2024, 12:50pm

Hi Frédéric,

Thank you for this quick answer.

At the moment, a parallel coupling scheme is not a possibility. Indeed, when performing the iteration bringing t^n to t^{n+1}, Solver2 needs boundary conditions at time t^{n+1}, which are provided by Solver1 at the end of its time step. And, before beginning its next time step, Solver1 needs flow information at time t^{n+1} from Solver2.

Thank you for the information complement about the CPU usage while waiting. I guess the observed poor performance is due to a hardware setting on my side, combined to the fact that everything is in debug mode.

Best,

Guillaume

fsimonis · November 4, 2024, 4:19pm

Hi Guillaume,

Debug is really slow. Release will make a massive difference. You can use cmake --preset=production to simplify the configuration.

Best
Frédéric

system · November 8, 2024, 10:50am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Communcation over sockets work for MPI-Fluid solver with 2 Procs, not for MPI-Fluid solver with 4 Procs Using preCICE configuration , calculix , fsi	4	395	July 6, 2021
Question about using MPI in two-way coupling Is preCICE for me?	4	262	June 19, 2023
Simulation stuck at advance for both solvers Using preCICE mpi , python	30	1522	May 4, 2021
Can we start two preCICE processes in the same program? Using preCICE mpi , multi-coupling	4	40	December 11, 2024
Deadlock Using Three-Solver Explicit Coupling Scheme Using preCICE communication , configuration	9	885	November 7, 2022

MPI runs: Sharing all ranks between two solvers

Related topics