Regarding the interpolation (mapping), have you tried using the default RBF mapping? In v3, many aspects have improved there:
How the interpolation occurs in parallel is a rather broad question but, in short: it happens inside preCICE. And no, you don’t need to combine the data into one MPI rank before giving them to preCICE.
Specifically regarding the OpenFOAM adapter, this topic might end up being related: OpenFOAM interface patch in parallel computation - #3 by Makis
I admit that a long time has passed since you already asked the question. Do you maybe have any updated information here?