MPI runs: Sharing all ranks between two solvers

Hello everyone,

I am using preCICE to couple 2 fluid solvers using a serial explicit coupling scheme: when one solver performs its iterations, the other one is at idle. Eventually, I want to run massively parallel simulations using MPI where both solvers share all of the allocated ranks. With a serial explicit scheme, all the available computational ressources would thus be allocated to the active solver (provided there is enough memory to store the data of each solver), so that the ranks are not at idle 50% of the time.

My question is the following : When a solver completes its time step and calls advance(dt), then waits for the other solver to complete its own time step, what is the load of the procs allocated to the now waiting solver? Can it be close to zero, allowing an almost complete transfer of computational ressources towards the active solver? If the answer is yes, how can I achieve this?

When I run parallel simulations on my local machine with shared ranks (e.g., 8 procs allocated in common to both solvers), each time step takes considerably longer than when the procs are allocated to one solver only.

Thank you for your help,

Guillaume

Hi Guillaume,

Have you attempted a parallel coupling scheme? This runs both solvers simultaneously, requiring only the faster to wait for the slower. You can then redistribute resources to minimize the idle time.
If you are able to use it often depends on many factors though.

If you use the `m2n:sockets’, then preCICE uses a mix of blocking and non-blocking communication methods. The blocking communication uses a pretty efficient event loop.
On the Linux workstations and clusters I used so far, CPU usage normally doesn’t reach the 1% mark while waiting.

So, you should be good to go.

Best Frédéric

Hi Frédéric,

Thank you for this quick answer.

At the moment, a parallel coupling scheme is not a possibility. Indeed, when performing the iteration bringing t^n to t^{n+1}, Solver2 needs boundary conditions at time t^{n+1}, which are provided by Solver1 at the end of its time step. And, before beginning its next time step, Solver1 needs flow information at time t^{n+1} from Solver2.

Thank you for the information complement about the CPU usage while waiting. I guess the observed poor performance is due to a hardware setting on my side, combined to the fact that everything is in debug mode.

Best,

Guillaume

Hi Guillaume,

Debug is really slow. Release will make a massive difference. You can use cmake --preset=production to simplify the configuration.

Best
Frédéric

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.