Performance issues with the Just-In-Time mapping

Consider the use case below, where a ship hull is navigating within a large domain.

This simulation is done by coupling two solvers, A and B, with preCICE. Solver A considers the detailed flow dynamics in the near field and the ship motion of the hull. Its mesh (MeshA) is dynamically moving together with the hull. Solver B, on the other hand, considers the waves propagation in the far field, and its mesh (MeshB) is static. The coupling is achieved by mapping the data to and from the overlapping zone, as shown in the picture. I have the following three questions regarding the data mapping in this scenario:

  1. In my implementation, SolverA always initiates the Just-In-Time mapping as MeshA is dynamic and MeshB is static (if my understanding is correct, a JIT mapping should always be done by the participant with the dynamic mesh). In the example below,var2 is written to MeshB in a conservative manner. This is ok for a variable like pressure or force, but what if var2 is velocity?
  <participant name="SolverA">
    <receive-mesh name="MeshB" from="SolverB" api-access="true" />
    <read-data name="var1" mesh="MeshB" />
    <write-data name="var2" mesh="MeshB" />
    <mapping:nearest-neighbor direction="read" from="MeshB" constraint="consistent" />
    <mapping:nearest-neighbor direction="write" to="MeshB" constraint="conservative" />
  </participant>
  1. setMeshAccessRegion() must be called during the initialization stage for the JIT mapping. For the time been, a large bound box (yellow box in the picture) containing the entire course of the hull navigation is used. This has caused the simulation extremly slow but it is expected, considering the fact that this is a volume-coupling with a large access region. My second question is that, in this particular use case, is there any chance I can improve the performance? In my earlier implementations without preCICE, a similar function to setMeshAccessRegion() is called every timestep from a participant with moving to-mesh, and the data structures for a B-Spline interpolation is re-constructured on the from-mesh. This way I can keep the Access Region as small as possible and the actual overhead of this reconstruction is very minor . Not sure for RBF though.

  2. My third question is that, in the case where both the participants rely on dynamic meshes, how to perform the mapping?

Thanks in advance.

Correct. This is how we expected the feature to be used.

JIT mappings don’t support write consistent because the from mesh needs to be known in its entirety to compute a correct mapping. We haven’t decided on how to implement this yet.
Current workaround is to read vertex ids and coordinates and perform your own mapping and write the data yourself.

As you use nearest-neighbour, the cost should be close to negligible.
Try using the profiling tools to see the true cost of calling the just-in-time functions:

This could be possible once we combine it with remeshing. This will be costly though.

This is only possible in preCICE with pseudo meshes as preCICE needs some structured information to exchange:

@fsimonis Thanks for the reply. I’ve done a profiling for my case and the sizes of the two coupling meshes are as follows:

   HOS-Coupling: HOS-Relaxation-Mesh1 has 39773800 vertices.
   HOS-Coupling: HOS-Coupling-Mesh1   has   119642 vertices.

JIT mapping is conducted on both meshes. And here’s the summary of my run:

Total time in adapter + preCICE: "15:42.833" (format: day-hh:mm:ss.ms)
  For setting up (S):            "02:19.034" (read() function)
  For all iterations (I):        "13:23.798" (execute() and adjustTimeStep() functions)

Time exclusively in the adapter: "01:25.913"
  (S) reading preciceDict:       "00:00"
  (S) constructing preCICE:      "00:00.232"
  (S) setting up the interfaces: "00:00.230"
  (S) setting up checkpointing:  "00:00"
  (I) writing data:              "00:13.460"
  (I) reading data:              "01:12.221"
  (I) writing checkpoints:       "00:00"
  (I) reading checkpoints:       "00:00"
  (I) writing OpenFOAM results:  "00:00" (at the end of converged time windows)

Time exclusively in preCICE:     "14:16.633"
  (S) initialize():              "02:03.553"
  (I) advance():                 "12:07.358"
  (I) finalize():                "00:05.721"
  These times include time waiting for other participants.

A snapshot of the trace is as follows and it is obvious that the data exchange cost huge amount of time in each time step.

I have attached my config.xml and trace.json files for your convinience. Any comments are much appreciated! Thanks.
precice-config.xml (3.9 KB)
trace.json.txt (2.9 MB)

Hi,

HOS-Coupling: HOS-Relaxation-Mesh1 has 39773800 vertices.
HOS-Coupling: HOS-Coupling-Mesh1   has   119642 vertices.
  <data:scalar name="etaHOS" waveform-degree="0" />
  <data:vector name="UHOS" waveform-degree="0" />

  <mesh name="HOS-Relaxation-Mesh1" dimensions="3">
    <use-data name="etaHOS" />
    <use-data name="UHOS" />
  </mesh>

These sizes together with this data setup means that the relaxation mesh alone contains 1.2 GB of data to be transferred. This data needs to be transferred, which results in the numbers you saw. There is not much you can do about this with your current setup apart from maybe switching to a faster network if available.

A possible solution would be to keep the size of the overlapping domain (MeshA) fixed and use vertex displacements to “move” it over MeshB.
This way the mesh is static from the point of view of preCICE. In SolverB you can then map between the reference mesh to the now internal MeshB taking into account the displacements.

Hope that helps!

Best
Frédéric

Please confirm if I understand correctly:
First, in Solver A, we create a preCICE MeshA for the overlapping domain, even though it is moving—we treat it as stationary. Then, in Solver B, we do not create MeshB. Instead, we use JIT-read (Just-In-Time read) to retrieve displacement data from MeshA, perform interpolation internally in Solver B, and finally write the interpolated results back to MeshA using JIT-write. This approach avoids transferring large amounts of data.

The idea has been implemented and tested. Performance has been improved a lot:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.