Running a PreCICE coupling solver based on OpenMPI on an HPC system using PBS.
For the same case, if it is executed on a single node, it works perfectly and runs successfully:
#PBS -l select=1:ncpus=96:mpiprocs=96:mem=256gb
However, once multiple nodes are used, for example:
#PBS -l select=2:ncpus=48:mpiprocs=48:mem=128gb
it hangs. Below is the last output before it gets stuck:
---[precice] Setting up primary communication to coupling partner/s
---[precice] Primary ranks are connected
---[precice] Setting up preliminary secondary communication to coupling partner/s
---[precice] Receive global mesh HOS-Coupling-Mesh1
---[precice] Receive global mesh HOS-Init-Mesh
---[precice] Prepare partition for mesh Near-Init-Mesh1
---[precice] Gather mesh Near-Init-Mesh1
---[precice] Send global mesh Near-Init-Mesh1
---[precice] Prepare partition for mesh Near-Pseudo-Mesh1
---[precice] Gather mesh Near-Pseudo-Mesh1
---[precice] Send global mesh Near-Pseudo-Mesh1
---[precice] Prepare partition for mesh Near-Relaxation-Mesh1
---[precice] Gather mesh Near-Relaxation-Mesh1
---[precice] Send global mesh Near-Relaxation-Mesh1
---[precice] Broadcast mesh HOS-Coupling-Mesh1
---[precice] Filter mesh HOS-Coupling-Mesh1 by bounding box on secondary ranks
---[precice] Filter mesh HOS-Coupling-Mesh1 by mappings
---[precice] Feedback distribution for mesh HOS-Coupling-Mesh1
---[precice] Broadcast mesh HOS-Init-Mesh
The command I used in the PBS submission script is:
mpirun -np "${NPROCS}" \
--mca btl self,vader,tcp \
--mca btl_tcp_if_include bond0 \
--mca oob_tcp_if_include bond0 \
apptainer exec \
--bind "${HOST_CASE_PATH}:${CONTAINER_CASE_PATH}" \
"${SIF_PATH}" \
/bin/bash --noprofile --norc -c '
. /etc/profile.d/openfoam-v2106.sh
. /etc/profile.d/runtimeEnv.sh
cd "${CONTAINER_CASE_PATH}/HOS-dynamic-3D-symmetry-stationary_seahive/Near1"
exec hosCoupleFoam -parallel
'
The precice-config.xml,job.pbsused in this case, as well as the solver output log file, are attached.
precice-config.xml (4.6 KB)
hosCoupleFoam.log (14.1 KB)
cpu_job.pbs.log (2.0 KB)