Heterogeneous jobs with MPI and PBS scheduler

,

I was wondering, if its possible to start a heterogeneous job with preCICE on one node with the PBS job scheduler. I think the only way to avoid double allocation with PBS is using only one mpirun command and split it, but if i understand it correctly, preCICE needs always two mpirun commands:

In preCICE, we always start simulations in separated MPI communicators (remember: we start solvers in different terminals, with their own mpirun commands), a feature that highly improves flexibility (solvers do not need to be in the same MPI communicator at any time).

According to --report-bindings this is a heterogenous job, but the simulation is stuck in the communication channel

#PBS -l select=1:node_type=node:ncpus=64:mpiprocs=64

mpirun --report-bindings --bind-to core \
  -np 32 ./SolverA.sh > solverA.log 2>&1 : \
  -np 32 ./Solver.sh > solverB.log 2>&1

The simulation works with two mpiruns of course, but according to --report-bindings it is a double allocation

#PBS -l select=1:node_type=node:ncpus=64:mpiprocs=64

mpirun --report-bindings -np 32 ./SolverA.sh > solverA.log 2>&1 &

mpirun --report-bindings -np 32 ./SolverB.sh > solverA.log 2>&1

Maybe somebody has a clue with PBS:D

Best regards,

Steffen

I am not sure if you read this part about [SLURM sessions])SLURM sessions | preCICE - The Coupling Library). It is a different scheduler, but a similar problem.

I would assume that you can do something similar as described in this section, e.g., by creating hostfiles for each mpirun but accessing the PBS_NODEFILE environment variable. This translation between SLURM and PBS environment variables might be useful.

Thanks for your input @ajaust. I looked over the SLURM sessions, but since i dont have SBATCH on my system, i could not test the same things and check the corresponding hostfiles how they look:D But i think at the examples they always had at least one node for each participant and therefore never used the same host for both participants:D

What worked for my case is using additonal rankfiles for both participants and changing the slot number between them to prevent the double allocation.

#PBS -l select=1:node_type=node:ncpus=40:mpiprocs=40


NODE=$(sort -u "$PBS_NODEFILE" | head -n 1)

for i in {0..19}; do
    echo "rank $i=$NODE slot=0:$i"
done > RankfileA.txt

for i in {0..19}; do
    echo "rank $i=$NODE slot=1:$i"
done > RankfileB.txt



mpirun --display-map --report-bindings -np 20 --host $NODE:40 --map-by rankfile:file=RankfileA.txt ./solverA.sh &

mpirun --display-map --report-bindings -np 20 --host $NODE:40 --map-by rankfile:file=RankfileB.txt ./solverB.sh

Output: --report-bindings first mpirun

[n060601:411810] Rank 0 bound to package[0][core:0]
...
[n060601:411810] Rank 19 bound to package[0][core:19]

Output: --report-bindings second mpirun

[n060601:411811] Rank 0 bound to package[1][core:20]
...
[n060601:411811] Rank 19 bound to package[1][core:39]

Like i said, since i couldnt inspect the hostfiles from SLURM i am not sure, if similiar things are done there or not:D

1 Like

Great to hear that you found a solution. :slight_smile:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.