I was wondering, if its possible to start a heterogeneous job with preCICE on one node with the PBS job scheduler. I think the only way to avoid double allocation with PBS is using only one mpirun command and split it, but if i understand it correctly, preCICE needs always two mpirun commands:
In preCICE, we always start simulations in separated MPI communicators (remember: we start solvers in different terminals, with their own mpirun commands), a feature that highly improves flexibility (solvers do not need to be in the same MPI communicator at any time).
According to --report-bindings this is a heterogenous job, but the simulation is stuck in the communication channel
I would assume that you can do something similar as described in this section, e.g., by creating hostfiles for each mpirun but accessing the PBS_NODEFILE environment variable. This translation between SLURM and PBS environment variables might be useful.
Thanks for your input @ajaust. I looked over the SLURM sessions, but since i dont have SBATCH on my system, i could not test the same things and check the corresponding hostfiles how they look:D But i think at the examples they always had at least one node for each participant and therefore never used the same host for both participants:D
What worked for my case is using additonal rankfiles for both participants and changing the slot number between them to prevent the double allocation.
#PBS -l select=1:node_type=node:ncpus=40:mpiprocs=40
NODE=$(sort -u "$PBS_NODEFILE" | head -n 1)
for i in {0..19}; do
echo "rank $i=$NODE slot=0:$i"
done > RankfileA.txt
for i in {0..19}; do
echo "rank $i=$NODE slot=1:$i"
done > RankfileB.txt
mpirun --display-map --report-bindings -np 20 --host $NODE:40 --map-by rankfile:file=RankfileA.txt ./solverA.sh &
mpirun --display-map --report-bindings -np 20 --host $NODE:40 --map-by rankfile:file=RankfileB.txt ./solverB.sh
Output: --report-bindings first mpirun
[n060601:411810] Rank 0 bound to package[0][core:0]
...
[n060601:411810] Rank 19 bound to package[0][core:19]
Output: --report-bindings second mpirun
[n060601:411811] Rank 0 bound to package[1][core:20]
...
[n060601:411811] Rank 19 bound to package[1][core:39]
Like i said, since i couldnt inspect the hostfiles from SLURM i am not sure, if similiar things are done there or not:D