FSI: Questions regarding the tasks executed within "advance/map.pet.mapData", "solver" and "solver.advance" entries in solver events summary

Dear preCICE users,

I am currently doing my MSc Thesis in the field of FSI. I am using a workflow that combines OpenFOAM for the fluid modelling and deal.ii for the structural modelling in order to simulate the Chen and Wambsganss case consisting of annular axial water flow over a single brass beam that is fixed at both of its ends ( for more information, see: Redirecting ). I am using preCICE 2.3.0 to couple the solvers. My preCICE config file is attached below. It is important to note that I am using a parallel subiteration scheme, with a compact and consistent RBF mapping between the domains.

I am particularly interested at the moment in how the solver uses up the computational time for a given FSI simulation, that is, how long different operations such as mapping, coupling, and calling the solvers take. For this, I looked into the events summary log files for the two domains (attached below). I am hence interested in the main contributors to the computational time for each of the two domains. These are the “advance”, “solver.advance”, and the different entries associated with the mapping between the two domains (particularly the advance/map.pet.mapData and the advance/map.pet.solveConsistent for the two). I tried looking into the source code of preCICE (more specifically, in PetRadialBasisFctMapping.hpp). While I do understand what map.pet.solveConsistent accounts for (that is, finding the unknown weights of the RBF mapping at a given subiteration), it was difficult for me to determine what the map.pet.mapData entry stands for. I see that the start of the event is at the beginning of the mapConservative method in PetRadialBasisFctMapping.hpp, within the line:

precice::profiling::Event e("map.pet.mapData.From" + this->input()->getName() + "To" + this->output()->getName(), profiling::Synchronize);

However, I do not see where the event is stopped. Hence, I do not know what actions are actually executed within the timeframe that map.pet.mapData.From accounts for.

My second question has to do with what operations are executed within the “solver” and the “solver.advance” entries. Again, by checking ParticipantImpl.cpp, I could understand that for a given solver, the “advance” entry within the events summary file stands for the exchange of information with the other domain (writing the data to be sent to the other solver, and reading the input from the other solver). Then, I assume the “solver.advance” entry would correspond to actually calling the Fluid or the Solid solver, though I cannot find a proof of this in the source code. But if this were indeed the case, it should be that the “solver.advance” would require a larger computational time for the fluid solver, since the grid I am using is two orders of magnitude larger than the structural one. However, it appears that the opposite is true, where “solver.advance” requires little time for the Fluid solver, and a long time for the Strutuctural one.

In conclusion, as the title says, I would like to know exactly what operations preCICE does within the “advance/map.pet.mapData”, “solver” and “solver.advance” time events. Thank you!

precice-Fluid-events-summary.log (12.6 KB)
precice-Solid-events-summary.log (11.6 KB)
precice-config.xml (2.9 KB)

Welcome, @georgescovici!

The documentation page of the events is currently indeed a bit poor: Performance analysis | preCICE - The Coupling Library

We greatly improved the page for the upcoming preCICE v3, and the page is already on GitHub, but not yet rendered on the website: https://github.com/precice/precice.github.io/blob/precice-v3/pages/docs/tooling/tooling-performance-analysis.md

Copying from there:

  • solver.advance time spent in the solver between advance() calls, including the time between initialize() and the first advance() call.
  • advance() time spent in preCICE advance(). This includes data mapping, data transfer, acceleration.

map.pet.mapData is the mapping operation of the PETSc-based RBF mapping. There are many options you can tune to make that faster: https://open-research-europe.ec.europa.eu/articles/2-51/v2

This indeed sounds strange. I could still imagine this to happen if the fluid code is built in release mode and optimized, while the structure code built in debug, unoptimized mode, with a lot of output.

@fsimonis recently invested a lot of work in the profiling/events and would definitely appreciate some feedback.

It is stopped when it goes out of scope, which happens at the end of the function. Hence, it measures the entire function map().
There are some nested events, which help you dissect this time:

The event solver.advance measures the time between API calls of advance() and stops in finalize().

The event is stopped (or created in the first call of advance) here:

And then started again here:

Note that there was a bug in this version of preCICE which flips the order of solver.advance and advance. This doesn’t affect measurements, but can break the rendering of exported trace files.

solver.initialize is tricky as it measures the time spent in the solver between configure(), the configuration-dependent initializeData(), intitalize(), and the first call to advance(). This can be pretty difficult to work with. Also, as it is paused and resumed it isn’t correct to visualize it as a timed block. This is the reason we cleaned this event up in version 3 (currently in development).

If you want to analyse performance inside preCICE, and you build preCICE from source, I recommend defining your own events.

Hope that helps :smile:

Thank you for your answers! I now understand that the solver.advance and advance entries are flipped, as well as where the map.pet.mapData computations stop.

Thank you also for the suggestion regarding creating my own events. I am not sure whether that is really necessary for my purpose, but I think that people with more experience in preCICE would be better able to give an answer. For this, I will explain my purpose, and perhaps others can contribute.

In my work, I am building an FSI methodology myself that is all contained within OpenFOAM (OF). This is supposed to replace the current workflow, which as mentioned in the original message, is using preCICE to couple OF and deal.ii. For both worfklows, the fluid solver remains unchanged, and is essentially treated like a black box.

Due to how the new structural solver is implemented in OF, the FSI coupling scheme works in serial. On the other hand, the current methodology uses a parallel scheme with preCICE. My objective is to compare the computational costs of the new workflow with the current one. My thinking is that I can do this simply by comparing how long the two workflows spend outside of the fluid solver, that is, how much the FSI computations “slow down” the solution of the hypothetically uncoupled fluid domain. I am more interested in the computational differences as the subiterations are executed, rather than in initializing and finalizing the methods, since the latter represent a negligible share of the total computational time for the application I am considering. The way that I see it, this strategy of computing the time outside of the fluid solver should be sufficient to also account for the difference in the type of coupling scheme (serial vs parallel) used for the new and the current workflow, respectively.

Regarding how I could actually quantify the difference: on one hand, for the new workflow, I can add time events in my workflow as required. At the same time, for the preCICE based methodology, I was thinking that perhaps I can only use the Fluid-events-summary.log to do this, rather than having to create my own events within preCICE. Based on my understanding, this should roughly correspond to the solver.advance entry within the precice-Fluid-events-summary.log file (if one neglects the bug that is incorrectly labeling the entries). Although this also contains the time between the initialize() and the first advance() call, as well as between the last call of advance() and the finalize() method, those are relatively insignificant compared to the time spent in actually computing the transient fluid flow.

However, even if what I said in the above paragraph is correct, I am currently seeing weird entries for the advance and the solver.advance entries within the precice-Fluid-events-summary.log file. In the original post, I sent the events for an FSI case where a laminar fluid model was used for initial testing. However, for my final application, a URANS fluid solver needs to be used. When doing that, I obtain two new Events files, that I attached in this message. There, one can see that for the Fluid, 33.4% of the time is allocated to advancing the coupling scheme! This is significantly higher than the 3.15% of the laminar case. To me, this result seems unreasonable, since for the URANS case a fluid grid that is 6 times larger than for the laminar case was used, and I would expect that the computational time for the fluid solver increases quicker with the mesh count than the mapping routine and all of the other steps in the coupling advancing. Even if my intuition is wrong, I wouldn’t expect that the coupling cost increases 10 times faster than that of the fluid solver, nor that the coupling takes a third of the total computational time. If one looks at the time spent in solving only the mapping between the two domains for the laminar and the URANS case, one can see that they are comparable, and much lower than the total computational time of the advance() method. For example, for mapping the solid displacements to the fluid domain, 4.48% of the computational time is spent for the URANS simulation, and 2.94% for the laminar one. This makes me think that the mapdata entries in the events log are more reliable than the advance ones. Do you see any reason as to why the discrepancy between the computation time of the coupling and that of the mapping exists? Is the execution time of advance() realistic for this case?

Since the use of the advance may not be reliable, I could then make an estimation of how long the current FSI methodology spends outside of the fluid solver by looking exclusively at the mapdata entries, which I would expect to be the main contributor to the total time spent in advance, and to compare the equivalent mapping time in the newly developed workflow. In order to execute a new subiteration, for the parallel scheme, preCICE must facilitate the mapping of the displacements to the fluid nodes, as well as the mapping of the stresses to the structural mesh. I imagine that the two correspond to advance/map.pet.mapData.FromSolid_meshToFluid-Mesh-Nodes and advance/map.pet.mapData.FromFluid-Mesh-CentersToSolid_mesh, respectively. However, I am not sure whether those operations are done sequentially or at the same time. If it’s the former, the rough estimation of the time spent outside the fluid solver is the maximum of the two entries, while if it’s the latter, it is their sum. In looking into the ParticipantImpl.cpp file, it would appear as if the mappings are done sequentially, given the two independent if conditions:

if (_couplingScheme->willDataBeExchanged(0.0)) {
    mapWrittenData();
    performDataActions({action::Action::WRITE_MAPPING_POST}, time);
  }

advanceCouplingScheme();

if (_couplingScheme->hasDataBeenReceived() || _couplingScheme->isTimeWindowComplete()) { // @todo potential to avoid unnecessary mappings here.
    mapReadData();
    performDataActions({action::Action::READ_MAPPING_POST}, time);
  }

Is this second approach a good estimation, or am I missing something? How are the mappings between the two domains executed: parallel or sequentially?

Thank you for reading this lengthy message!

precice-Fluid-events-summary.log (12.6 KB)
precice-Solid-events-summary.log (11.6 KB)

Since the use of the advance may not be reliable,

The time measurements of advance and solver.advance are reliable.

Note that each Participant reports these timings from its perspective. If you use a serial coupling-scheme, then advance of Fluid contains solver.advance of Solid and vice versa.
If you use a parallel coupling scheme, and one Participant completes its time steps notably faster than the other one, then you are facing a load imbalance and the other participant has to wait in advance.

On top of that, you are running your solvers in parallel, which makes event files even more difficult to reason about.

Some tips that may help you:

  • Visualize the events files. This makes load imbalance and serial coupling way easier to understand.
  • For comparable measurements, enable synchronization <precice-configuration sync-mode="1"> in the config. This synchronizes all ranks of a participant before some distributed communication, avoiding waiting times between ranks in some events. This will slow down your simulation, but leads to more comprehensive results.
  • Is the mesh partitioned evenly? The event summary shows only the primary rank 0, if some rank owns a much bigger portion of the coupling mesh, then your used RBF mapping can easily lead to skewed timings.