Follow-up regarding the provided case:
I ran your case in various configurations on my system and the runtime is dominated by the generation of “ownership data”. This is what @DavidSCN referred to in his post and version 3 brings significant gains here as you can see in the last example.
Setup: preCICE v2.5.1/v3.1.1 (Release + PRECICE_RELEASE_WITH_ASSERTION
, no IPO) with clang 17.0.6 and mold linker 2.30.0 on AMD 5900X with 32GB of memory, using the loopback interface. Setup run on a tmpfs (in memory). I tried different filters, 2LI, and different safety factors.
I used the event2trace script to produce the traces of the following cases and visualized them using ui.perfetto.dev.
Provided case
Case as provided by you. Only change is changing the interface.
Default settings
Provided case with default safety-factor and geometric filter (I remove them):
trace-stock.json.txt (6.8 MB)
No safety-factor
Provided case with safety-factor of 0 and no geometric filter
Two-level initialization
Provided case with default safety-factor and geometric filter and two-level-initialization:
V3 example
I ported your case to preCICE v3 and the result looks like this.
pypyrecice_v3.zip (301.8 KB)
The entire initialization is now ~2s coming from ~35s in v2.
trace-v3.json.txt (4.1 MB)
Note that the v3 version of the python bindings trade performance for safety and usability. Reading the expensive rhoVW
in solver2 for example takes 50ms in python, but only 3ms in preCICE. This inflates the time spent in the solver, which inflates the time the solver needs to wait for each other.
In practise your solve step should dwarf the read/write in terms of computational cost.