Deadlock Using Three-Solver Explicit Coupling Scheme

Hello all, I have a quick question about setting up a precice config file using three solvers. The goal is to use serial explicit coupling such that Solver A (using some initial values) runs, passes data to Solver B which then runs and passes data to Solver C. Solver C would then pass data back to Solver A, which would run again beginning the next time step and so on and so forth. The issue I’m having is with defining the participant order for each individual bi-coupling scheme. If I define them as: first A second B, then first B second C, then first C second A, for each bi-coupling scheme, precice hangs on the “initialize: slaves are connected” message. I’m assuming that this is because there is no place to start since each solver requires a different one to be run first. Since the parallel explicit approach would not be appropriate as the solvers must run in series, how do I tell precice to begin by running solver A?

xml config file attached.

preciceConfig.xml (3.5 KB)

To cite from Bernhard’s thesis, page 140

A careless setup of coupling schemes and participants can lead to a deadlock in the coupled simulation

I am not too sure about the “careless”, these compositional coupling schemes are really complicated (but also quite powerful), but deadlocks are part of the game.

What you currently have:

      <coupling-scheme:serial-explicit>
         <participants first="SolverA" second="SolverB"/> ...
         <exchange data="Data1" mesh="MeshB" from="SolverA" to="SolverB"/>
      </coupling-scheme:serial-explicit>
    
      <coupling-scheme:serial-explicit>
         <participants first="SolverB" second="SolverC"/> ...
         <exchange data="Data2" mesh="MeshC" from="SolverB" to="SolverC" />
      </coupling-scheme:serial-explicit>
    
      <coupling-scheme:serial-explicit>
         <participants first="SolverC" second="SolverA"/> ...
         <exchange data="Data3" mesh="MeshA" from="SolverC" to="SolverA"/>
      </coupling-scheme:serial-explicit>

Your explanation why this leads to a deadlock is correct. To get the behavior that you want, though, i.e. to have a sequential execution (one solver after the other) and an explicit coupling (to directly go to the next timestep after one cycle), the fix is easy: You simply have to swap the participants in the third coupling scheme as SolverA needs to run before SolverC in every cycle and the exchange happens to next cycle.

      <coupling-scheme:serial-explicit>
         <participants first="SolverA" second="SolverC"/> ...
         <exchange data="Data3" mesh="MeshA" from="SolverC" to="SolverA"/>
      </coupling-scheme:serial-explicit>

Two things to consider when you move forward:

  • For three coupled participants, a sequential coupling might be inefficient. Simply changing serial to parallel, lets all three participants run in parallel. Then, the order first vs second can neither lead to deadlocks.
  • You only use unidirectional coupling schemes, but yet the combination of all three leads to a closed circle. Thus, an explicit scheme might lead to instabilities. If you want to change to an implicit scheme, simply replacing all explicit by implicit will not do the job. Then, a fully implicit scheme is needed.

Thank you for your response.

I tried the fix you suggested and the solution still reaches deadlock. ‘SolverA’ works as expected, the maps to ‘SolverB’ which works as expected. However, after ‘SolverB’ is run, and mapping to ‘SolverC’ is attempted the solution hangs.
In the terminal window for ‘SolverB’ the following message is displayed:
[impl::SolverInterfaceImpl]:1418 in mapWrittenData: Compute write mapping from mesh “MeshB” to mesh “MeshC”.
In the terminal window for ‘SolverC’ the following message is displayed:
[impl::SolverInterfaceImpl]:240 in initialize: Slaves are connected

At this point the solution is deadlocked and does not progress. It is worth noting that parallel coupling was attempted and ran successfully. However, the results were erroneous and exhibited a ‘lag’ between the solvers as each solver needs data referencing the current timestep from the previous solver, for example to accurately solve for time t = 1 SolverB needs data from SolverA for time t = 1 but when using parallel coupling SolverB only has access to SolverA data from the previous timestep, say t = 0.9, when attempting to solve for the current timestep t = 1.

Thanks, Rohan

Mmh, I guess I need more insight then. Could you please enable some debug output and upload/paste the output of all three solvers?

To do so, you need to build preCICE in Debug mode and use the following in your config:

<log>
    <sink type="stream" output="stdout"  filter= "(%Severity% > debug) or (%Severity% >= debug and %Module% contains SolverInterfaceImpl) or (%Severity% >= debug and %Module% contains partition) or (%Severity% >= debug and %Module% contains cplscheme)"  enabled="true" />	
</log> 

More information on logging in the wiki.

I’ve included the debug output to this message, since the debugging file was run using non-generic solver names/data, I’ve also included another precice config file to assist in translation of the log files.

In addition, I noticed changes in the behavior of precice depending on the ordering of the bi-coupling schemes, for example moving the heat --> damage bicoupling scheme above the light --> heat bicoupling scheme enabled precice to complete one full timestep before becoming deadlocked. How does precice interpret the order of bicoupling schemes?
Thank you very much!

preciceConfig.xml (4.3 KB) debugDamageSolver.log (3.7 KB) debugHEAT.log (7.3 KB) debugLIGHT.log (7.5 KB)

This case is really tricky. Yes, the order of bi-coupling schemes plays a crucial role. Each participant “initializes” and “advances” the individual coupling schemes in this order. And this should also do the trick here.
Can you please try:

      <coupling-scheme:serial-explicit>
         <participants first="LIGHT" second="DamageSolver"/> ...
         <exchange data="Damage" mesh="mcxyz_mesh" from="DamageSolver" to="LIGHT"/>
      </coupling-scheme:serial-explicit>

      <coupling-scheme:serial-explicit>
         <participants first="HEAT" second="DamageSolver"/> ...
         <exchange data="Temperatures" mesh="damageMesh" from="HEAT" to="DamageSolver" />
      </coupling-scheme:serial-explicit>

      <coupling-scheme:serial-explicit>
         <participants first="LIGHT" second="HEAT"/> ...
         <exchange data="VolumetricHeatSources" mesh="heat_mesh" from="LIGHT" to="HEAT"/>
      </coupling-scheme:serial-explicit>

Furthermore, please note that in a serial coupling scheme the first participant already receives data in intialize. In fact, data that the second participant sends after the first advance. This way you get the staggered behavior.
This picture could help. A proper documentation of this behavior is on our list.

Thank you for the link and help about the order of bi-coupling schemes, it was very informative. I tried this configuration of bi-coupling and the solution hung after the “LIGHT” solved, I tried all possible orders of the bi-coupling schemes and the furthest the solution would go was one full iteration before hanging. This occurred when the bi-coupling schemes were in the following order:
Heat->Damage, Light->Heat, Light->Damage
Interestingly, the solution now hangs while both LIGHT and DamageSolver are waiting to receive data, I’m not sure what HEAT is attempting to do when the solution is deadlocked. I’m including the debug files for the aforementioned bi-coupling scheme in the hopes that they yield some insight.
Thank you for your time.
debugDamageSolver2.log (7.4 KB) debugHEAT2.log (1.7 KB) debugLIGHT2.log (7.5 KB)

After further thinking and drawing many pictures, I am confident that your problem has currently indeed no solution :confused:. We would need to sort coupling schemes appropriately. I opened an issue:

But let’s step back a bit.

Such a “lag” is a feature of any explicit coupling scheme. Also for three serial-explicit coupling schemes, Light would have a lag to Damage by one timestep.

Is your problem time dependent? Or are you only interested in a steady-state solution?
In both cases, a fully-implicit multi coupling scheme could be beneficial.

Okay, thank you for your continued help on this issue!
I will absolutely investigate and consider implementation of a fully-implicit multi coupling scheme.

Thanks!