Problems with RBF on large unstructured meshes

I tested RBF Partition of Unity mapping with the ISSM adapter for the first time and ran into a couple of issues. Not sure if they are all related, so I put them all here. Sorry if this gets a bit long. Overall, I found using RBF a lot more work than expected.

The general situation:

ISSM mesh: unstructured triangles, between ~0.5km and ~10km resolution in interesting areas, up to 100km resolution in uninteresting areas (uninteresting means far away from ice, so not normally included in any actual computation, but still included in the computational domain, e.g. to get a nice rectangle), somewhere between 1 and 10 mio vertices in total

CUAS mesh: homogeneous quad mesh, 0.6km resolution, ~10 mio vertices in total

Run in parallel with 200-1000 Tasks per participant on an HPC cluster.

The meshes cover the same computational domain. All mappings are read-consistent, I attach the general precice config, specific settings are described below.

precice-config.xml (4.5 KB)

Nearest Neighbor mapping works fine. Linear cell interpolation as well, at least in one direction. CUAS mesh does not currently include connectivity, so it hasn’t been tested, but that can be added quickly and I don’t expect any problems there.

Issue 1: RBF mapping from CUAS to ISSM causes crashes

The ISSM adapter crashes when I set RBF mapping from CUAS to ISSM mesh. The crash is seemingly always in the same function (log message [partition::ReceivedPartition]:344 in filterByBoundingBox: eBroadcast mesh CUASMesh) but with different errors. If precice is a debug build, slurm reports out of memory. If precice is a release build, I get a segfault (or assertion if release asserts are enabled in precice). Not sure if that’s exactly the same error or the debug build runs out of memory for other reasons, but it is inside the same function. Below I attached logs with debug and release builds. The debug log has more details but I cut off the beginning for size. I can provide full logs with trace output if needed, but they are huge.

issm log relwithdebinfo.txt (339.3 KB)

issm log debug.txt (2.3 MB)

I tried different basis functions (compact poly C0 with r=10km, thin plate splines), different vertices-per-cluster (between 50 and 500), always crashes. Mapping with NN works fine.

Debugging has not been succesful so far because of the size.

I have two smaller setups where RBF mapping works OK. One is similar, same solvers and general meshes, but different geometry and about half the size (~5 mio vertices), running on the same HPC cluster. The other is tiny (~400 vertices) for testing, coupling ISSM and ASTE, tested with a few mpi tasks on my workstation. So it’s not a general problem of RBF or the adapters.

Issue 2: Choosing the basis function for RBF mapping from ISSM to CUAS

I struggle to pick the right basis function for mapping from ISSM to CUAS.

Global basis functions are expensive to intialize (I got it down to ~25mins by setting vertices per cluster to 500. with 1000 vertices there is a bad_alloc exception, with less than that the slurm job timed out after 2+ hrs of computing the mapping, not sure how long it would have taken to complete. The long resolution is probably expected for a global method, but is there a way to optimize this?

Local basis functions are tricky because the ISSM mesh resolution varies so much. A radius of 10km would be probably be fine for high resolution areas (~20 vertices in each direction) but is only 1 or 2 vertices in areas where ISSM has low resolution and 0 vertices in the uninteresting areas, where I get many artifacts where there is supposed to be just empty ocean, see the image below.

This is concerning even if the areas aren’t interesting initially, because what is “interesting” or not is a dynamic property of the solvers, and artifacts in the wrong place can cause uninteresting areas to unexpectedly become interesting and influence the global solution. I guess the only solution to this would be to carefully mask the continent and don’t add the uninteresting areas to the coupling mesh?

Assuming the uninteresting areas are correctly masked somehow, the choice of radius or shape parameter is still not obvious, there is still at least an order of magnitude between fine and coarse resolution. Too small radius and the solution suffers, too big and it becomes badly conditioned or basically a global basis function. Any recommendations are welcome.

Issue 3: ghost vertices

I guess this is more of a feature request? To enable RBF mapping, I had to modify the ISSM adapter to exclude ghost vertices on the edges of MPI processes. Then I also have to manually synchronize values after I read them from precice, because the solver requires values at ghost vertices and does not synchronize internally before solving. So now the adapter has two “modes”, one with ghosts and mesh connectivity, and another without ghosts and without mesh connectivity (ghosts are required to define connectivity), but with manual synchronization. This feels like unnecessary work. Would it be possible for precice to ignore ghosts automatically in RBF mappings, maybe with some additional help from the adapter? Then the adapter can support all mappings with minimal code.

I looked at the documentation page for distributed meshes, and I considered using the two mesh approach, but unless I miss something that has pretty much the same drawbacks as my current solution with two “modes”, i.e. the adapter needs to be able to handle both with and without ghosts, and it requires additional configuration by the user to set two meshes in the precice config.

Hi @dabele :waving_hand:

Issue 3 is indeed more a feature request and what you write there is correct. But let’s put this aside for the moment.

Issues 1 and 2 should not happen. PUM-RBF should not be sensitive to global vs local basis functions. We actually use a direct solver in both. I would recommend using a higher order compact-polynomial-c6 and a support radius of maybe 200km. vertices-per-cluster should be more like 50 than 500.

You are running both codes in parallel, right? The domain decomposition could be interesting and on which rank the mapping crashes. All sound a bit like “out of memory”. Could you please export your meshes and upload somewhere (while running with NN)? Then, we could try to reproduce with ASTE.
Similarly to here: ASTE (artificial solver testing environment) wind turbine blade tutorial | preCICE - The Coupling Library

You still run both codes on the same nodes, right? Worth a try to properly separate things here.

I ran a few more tests with your suggestions.

Exported meshes are uploaded here: exported-meshes.tar.gz - DESY Sync & Share

All the runs were on 512 tasks on 4 exclusive cluster nodes per solver, no shared resources between solvers. These resources are comfortably enough to run the solvers on their own with this setup, so I don’t think RAM should be an issue if (!) everything is working correctly.

Mapping CUAS → ISSM

this is the one that was crashing before (see Issue 1 above) and it still is. I tried many different combinations by now, although I couldn’t remember them all. This time I tried PUM with compact polynomials C6 and radius 10km. 10km should be sufficient in this mapping direction, since CUAS has a homogeneous mesh with 600m resolution. But I also tried C0 with 200km and also tried with more vertices per cluster and all crash with the same segfault.

Mapping ISSM → CUAS:

I started with PUM - CPC6 - 200km as you suggested, but the run stopped. Two different error messages from ~20 ranks:

ERROR: e[0mSupport radius for radial-basis-function compact polynomial c2 has to be larger than zero. Please update the "support-radius" attribute.
ERROR: e[0mOutput vertex 558600,-1.8294e+06 of mesh "CUASMesh" could not be assigned to any cluster in the rbf-pum mapping. This probably means that the meshes do not match well geometry-wise: Visualize the exported preCICE meshes to confirm. If the meshes are fine geometry-wise, you can try to increase the number of "vertices-per-cluster" (default is 50), the "relative-overlap" (default is 0.15), or disable the option "project-to-input".These options are only valid for the <mapping:rbf-pum-direct/> tag.

The first message is just confusing, I’m not using C2.
I guess the second message means that some CUAS vertices are too far away from any ISSM vertices or something? Although the vertex mentioned in the error message is in an area of high resolution.

So I increased vertices-per-cluster to 200, and the simulation ran successfully. The result inside the ice sheet seems OK, but there are still artifacts in the ocean. Unlike my first try above with 20km radius, the artifacts are only close the land, see image below. Maybe it’s the discontinuity at the edge of the ice sheet that RBF can’t handle well? e.g., ice thickness goes from hundreds of meters to zero. maybe the fact that there are so many more vertices inside the ice sheet than in the open ocean makes this even worse?

For comparison, I tried some other mappings. C0 polyonmials worked similar to C6, produces artifacts that are qualitatively similar, but around one order of magnitude lower in scale.

For thin plate splines I needed to increase vertices-per-cluster to 500, otherwise I got error messages like those above for C6, and also this new one:

ERROR: e[0mThe interpolation matrix of the RBF mapping from mesh "ISSMMesh" to mesh "CUASMesh" is not invertable. This means that the mapping problem is not well-posed. Please check if your coupling meshes are correct (e.g. no vertices are duplicated) or reconfigure your basis-function (e.g. reduce the support-radius).

When I got it to run sucessfully, thin plate splines produces artifacts only directly at the edge of the ice sheet, within one or two mesh elements. So far, that is the best mapping I got from RBF, and computational overhead is reasonable, took around 10 mins here, maybe a bit more with different MPI tasks, which isn’t too bad for a simulation that would usually run for a few hours or days.

I can reproduce the crash with ASTE. The crash happens in the same function, but the error message is a bit different, but maybe related.

(0) 22:42:10 [partition::ReceivedPartition]:344 in filterByBoundingBox: e[0mBroadcast mesh CUASMesh
---[preCICE:ISSM:0] INFO : Broadcast mesh CUASMesh
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
[prod-025:861813] *** Process received signal ***
[prod-025:861813] Signal: Aborted (6)
[prod-025:861813] Signal code:  (-6)
[prod-025:861813] [ 0] /usr/lib64/libpthread.so.0(+0x12cf0)[0x15554f025cf0]
[prod-025:861813] [ 1] /usr/lib64/libc.so.6(gsignal+0x10f)[0x15554ec9cacf]
[prod-025:861813] [ 2] /usr/lib64/libc.so.6(abort+0x127)[0x15554ec6fea5]
[prod-025:861813] [ 3] /albedo/soft/sw/spack-sw/gcc/12.1.0-aosheo5/lib64/libstdc++.so.6(+0xa9d89)[0x15554f87dd89]
[prod-025:861813] [ 4] /albedo/soft/sw/spack-sw/gcc/12.1.0-aosheo5/lib64/libstdc++.so.6(+0xb540a)[0x15554f88940a]
[prod-025:861813] [ 5] /albedo/soft/sw/spack-sw/gcc/12.1.0-aosheo5/lib64/libstdc++.so.6(+0xb5475)[0x15554f889475]
[prod-025:861813] [ 6] /albedo/soft/sw/spack-sw/gcc/12.1.0-aosheo5/lib64/libstdc++.so.6(+0xb56c7)[0x15554f8896c7]
[prod-025:861813] [ 7] /albedo/soft/sw/spack-sw/gcc/12.1.0-aosheo5/lib64/libstdc++.so.6(+0xa9994)[0x15554f87d994]
[prod-025:861813] [ 8] /albedo/work/projects/landice/issm-precice/spack-install-0.22.1/seacas/2022-02-16-5qqgnwp/lib/libIoss.so.2(_ZNSt6vectorIiSaIiEE17_M_default_appendEm+0x87)[0x155554d8eb47]
[prod-025:861813] [ 9] /albedo/work/projects/landice/issm-precice/spack-install-0.22.1/precice/3.1.1-5hnjbgq/lib64/libprecice.so.3(+0xe9f78)[0x1555544cdf78]
[prod-025:861813] [10] /albedo/work/projects/landice/issm-precice/spack-install-0.22.1/precice/3.1.1-5hnjbgq/lib64/libprecice.so.3(+0x3768ad)[0x15555475a8ad]
[prod-025:861813] [11] /albedo/work/projects/landice/issm-precice/spack-install-0.22.1/precice/3.1.1-5hnjbgq/lib64/libprecice.so.3(+0x377b9a)[0x15555475bb9a]
[prod-025:861813] [12] /albedo/work/projects/landice/issm-precice/spack-install-0.22.1/precice/3.1.1-5hnjbgq/lib64/libprecice.so.3(+0x39b7f3)[0x15555477f7f3]
[prod-025:861813] [13] /albedo/work/projects/landice/issm-precice/spack-install-0.22.1/precice/3.1.1-5hnjbgq/lib64/libprecice.so.3(+0x39cb61)[0x155554780b61]
[prod-025:861813] [14] /albedo/work/projects/landice/issm-precice/aste/install-precicev3/bin/precice-aste-run[0x44e73a]
[prod-025:861813] [15] /albedo/work/projects/landice/issm-precice/aste/install-precicev3/bin/precice-aste-run[0x410513]
[prod-025:861813] [16] /usr/lib64/libc.so.6(__libc_start_main+0xe5)[0x15554ec88d85]
[prod-025:861813] [17] /albedo/work/projects/landice/issm-precice/aste/install-precicev3/bin/precice-aste-run[0x41023e]
[prod-025:861813] *** End of error message ***

The bad_alloc is weird. usually, slurm’s out-of-memory handler gives an error message if slurm tasks are actually out of memory. And slurm only reports around ~19GB of memory used in the job.

This is with mapping

			<basis-function:compact-polynomial-c0 support-radius="10e3"/>
		</mapping:rbf-pum-direct>

Just to make sure it’s not a different issue with ASTE, nearest-neighbor mapping works fine.

I traced the crash to ReceivedPartition::createOwnerInformation. with RBF there is no bounding box filter. So every task sends ~13 mio IDs and tags to primary. This works for 256 MPI tasks, but 512 is too many.

Just for testing if there are any more errors, I built a “fix” (probably not correct in general) where primary stores only one set of IDs if there is no bounding box filter instead of one set per task (because the IDs are all identical) and the tags are stored as boolean instead of int. Then the simulation runs through, at least in ASTE.

I also tried two-level initialization, hoping that would avoid the entire problem since it is recommended for large meshes, but this also crashes. The error messages are for CUAS:

(508) 01:35:34 [partition::ReceivedPartition]:50 in communicate: e[0mReceive mesh partitions for mesh ISSMMesh
(167) 01:35:59 [com::SocketCommunication]:645 in receive: e[31mERROR: e[0mReceiving data from another participant (using sockets) failed with a system error: read: End of file [asio.misc:2]. This often means that the other participant exited with an error (look there).

and for ISSM

(261) 01:35:36 [partition::ReceivedPartition]:153 in compute: e[0mCreate owner information.
[prod-134:1331775] *** An error occurred in MPI_Recv
[prod-134:1331775] *** reported by process [765939408,390]
[prod-134:1331775] *** on communicator MPI_COMM_WORLD
[prod-134:1331775] *** MPI_ERR_TRUNCATE: message truncated
[prod-134:1331775] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[prod-134:1331775] ***    and potentially your MPI job)

I wasn’t able to trace this further yet. Maybe a different memory problem.

I profiled the RBF mapping methods (at least as far as it currently works). The mapping config is

<mapping:rbf-pum-direct direction="read" from="ISSMMesh" to="CUASMesh" constraint="consistent" vertices-per-cluster="500">
	<basis-function:compact-polynomial-c0 support-radius="200e3"/>
</mapping:rbf-pum-direct>

It seems mapping is not well load balanced in our setup. This is a snapshot from the profile (pink is preCICE advance, blue is solver.advance):


Some tasks take almost no time at all and wait, others take more than 3 s (around 1/3 of the entire coupling window). For comparison, nearest neighbor or linear interpolation take ~10 ms, basically nothing.

Maybe this is also related to the uneven resolution of the ISSM mesh?

Mapping CUAS → ISSM

So, as you mentioned already, the memory issue for this setting is kind of expected without two-level initialization: you duplicate a 10m mesh*number of ranks on the cluster. Two-level initialization should really resolve this issue and I already tested larger meshes in such a setting. Since the ISSM mesh is quite nonuniform, I could imagine that you may want to increase the safety factor a bit. The basis function configuration has no influence on the repartitioning

<receive-mesh name="CUASMesh" from="CUASMPI" safety-factor="0.5" />

In general, that (issue 1) is an issue of the repartitioning, not of the mapping.

Issue 2: Choosing the basis function for RBF mapping from ISSM to CUAS

So first of all, for PU, the concept of global and local basis functions does not exist. But the overall setup is quite problematic: the PU introduces locality, and now you ask it for a solution in regions, where you do not provide the input information.

I would probably recommend switching off the projection (in both directions):

project-to-input=“false”

in the PU-RBF configuration. Then I would keep everything else as default values and only increase the vertices per cluster if really needed.

For the choice of the basis function, you can start with C0 and increase the continuity (up to C8) depending on the smoothness of your data. At some point, matrices might become ill-conditioned if your support-radius is too small.

Issue 3: ghost vertices

You must not put ghost vertices into the coupling mesh, so a bit of an open question why you need a mode including ghost vertices. Performing a ghost update in the solver code should also be doable. From the mathematical perspective, you could define the ghost vertices on the to mesh for your consistent mapping, if this helps. Duplicating vertices (that’s what you do with ghost vertices) on the input side leads to ill-defined problems and preCICE has no way of saying “the one vertex is relevant, the other is not”.

It seems mapping is not well load balanced in our setup

The really important part for your scenario is the partitioning of the ISSM mesh: do you have an equal number of vertices per rank? I would try to have comparable mesh resolution on each rank. But of course, this will trigger substantially different workloads from the CUAS side

You are right that it’s not really an issue of mapping. I only encountered it with RBF, because RBF disables the Bbox filter. I feel memory could still be optimized here, but that’s a different topic.

I got two-level initialization to work now. There is a bug where a variable in a loop is used as the buffer in an asynchronous send in ReceivedPartition::createOwnerInformation:

std::vector<com::PtrRequest> vertexListRequests;

for (auto &receivingRank : sharedVerticesSendMap) {
  int  sendSize = receivingRank.second.size(); // <- send buffer is not kept alive
  auto request  = utils::IntraComm::getCommunication()->aSend(sendSize, receivingRank.first);
  // ...

Here is the way I fixed it, but I did not think too hard about the most elegant way:

std::vector<com::PtrRequest> vertexListRequests;
vertexListRequests.reserve(sharedVerticesSendMap.size());
// to keep all send buffers alive until the async sends are complete
std::vector<int> vertexListSizeSendBuffers;
vertexListSizeSendBuffers.reserve(sharedVerticesSendMap.size()); // make sure there are no reallocations

for (auto &receivingRank : sharedVerticesSendMap) {
  vertexListSizeSendBuffers.push_back(static_cast<int>(receivingRank.second.size()));
  auto request  = utils::IntraComm::getCommunication()->aSend(vertexListSizeSendBuffers.back(), receivingRank.first);
  vertexListRequests.push_back(request);
  // ...

With that fix, two-level initialization seems to work. You are right I needed to play with the safety factor.

I’ll try that.

  1. It’s one of the methods outlined in the preCICE documentation here: Dealing with distributed meshes | preCICE - The Coupling Library
  2. I don’t know there is another way to get mesh connectivity everywhere. I think preCICE fall back to nearest neighbor in that case? Maybe not the worst, but not ideal.
  3. Reading/Writing ghost values is really easy with ISSM. In fact it’s much less code in the adapter to just add ghost vertices as if they were normal. I write the exact same value for ghost values on all tasks (ISSM syncs them internally, so no work in the adapter) and read the ghost values from preCICE on all tasks (ISSM requires ghost values on every task, otherwise the adapter has to sync them manually). I never had a problem that preCICE gives me inconsistent values. Of course I would have to do it differently for conservative variables, but I’m not sure there even are any in ISSM, so I didn’t do anything about that yet.

maybe users should be able to “tag” vertices from the outside and use the tag instead of vertex ids to read variables and set connectivity?

Both meshes are equally partitioned to tasks, CUAS perfectly, ISSM to within a few percent with parmetis. I don’t think equal resolutions are feasible. Right now the meshes have approx. equally fine resolution where it’s needed the most for ISSM. In flat areas, ISSM resolution is coarser. It would be very expensive to refine the resolution of ISSM in the flat areas to the same fine resolution as CUAS, around 10-100 times the number of vertices, much more expensive than the currently unbalanced mapping is, and it’s all wasted resources because we don’t need that resolution. CUAS doesn’t currently support unstructured meshes, so we can’t change CUAS mesh to look like ISSM mesh either. And even if we could, the solvers don’t generally need fine resolution in the same place, so the best we could do would be a mesh that is fine where both solvers need it, which only wastes some of the extra resources.

I think you misread my message: equal resolution per rank $$\neq$$ equal resolution of the participants. What I was trying to say is: for preCICE, the load balancing is set by the vertex count per partition. So, is the vertex count per partition balanced in your case? Ideally, you would then have equal vertex counts per partition where each rank holds a mesh of similar resolution.

Reading/Writing ghost values is really easy with ISSM. In fact it’s much less code in the adapter to just add ghost vertices as if they were normal. I write the exact same value for ghost values on all tasks (ISSM syncs them internally, so no work in the adapter) and read the ghost values from preCICE on all tasks

As mentioned: for your case: reading duplicate (essentially doing the same operation twice) is fine, writing on duplicate vertices will lead to singular interpolation problems.

It’s one of the methods outlined in the preCICE documentation here: Dealing with distributed meshes | preCICE - The Coupling Library

That’s indeed quite interesting. I would doubt that this is actually correct (i.e. the “Use a single mesh and duplicate copied vertices” should be removed in my opinion, need to discuss with the author).

There is a bug where a variable in a loop is used as the buffer in an asynchronous send in ReceivedPartition::createOwnerInformation

So I am looking at lines 690 onwards: I am really not sure you are on the right track: the data itself is in the std::map, which should remain alive throughout the loop/send and is exchanged in

auto request = utils::IntraComm::getCommunication()->aSend(span<const int>{receivingRank.second}, receivingRank.first);

The receivingRank only holds a reference to this map. What you declare above as “not kept alive” is simply an integer for the size exchange. Maybe it should be receivingRank.second.size() instead of an lvalue. However, then it would still be a bit surprising if this fails for your case, but succeeds for our own cases. Maybe you have indeed partitions which are empty, which then causes failure. Can you try with the current develop state of preCICE, increase the safety factor to see, if it is definitely related to your code version?

I only encountered it with RBF, because RBF disables the Bbox filter. I feel memory could still be optimized here, but that’s a different topic.

We had a related discussion in the past Clean-up tagging for kernel methods by davidscn · Pull Request #1912 · precice/precice · GitHub

in case you are interested.

I’m pretty sure what I said about the bug is correct, but I’ll make a pull request with more explanation and we can discuss it there.

Is there an easy way to see the partitions?

project-to-input=false doesn’t help, with or without it I get error messages like

Output vertex -261600,-1.1046e+06 of mesh "CUASMesh" could not be assigned to any cluster in the rbf-pum mapping. This probably means that the meshes do not match well geometry-wise: Visualize the exported preCICE meshes to confirm. If the meshes are fine geometry-wise, you can try to increase the number of "vertices-per-cluster" (default is 50), the "relative-overlap" (default is 0.15), or disable the option "project-to-input". These options are only valid for the <mapping:rbf-pum-direct/> tag.

Of the options in the error message, only vertices-per-cluster works.

apparently, something changed between version 3.1 and version 3.2. Can’t get RBF from ISSM to CUAS working at all any more with any version >= 3.2. Not a great loss, since it wasn’t working well anyway. I think we need to mask out the “uninteresting” coarse parts somehow.

But at least RBF from CUAS to ISSM works well with the fix.