Running CalculiX and OpenFOAM on HPC

I’m extremely delighted and grateful for the guidance I’ve received here, on https://precice.discourse.group/t/does-the-executable-binary-file-of-calculix-support-running-on-a-slurm-cluster-via-mpi/2336/6 and on the CCX forum: https://calculix.discourse.group/t/can-calculix-run-across-multiple-nodes/1316/6. Thank you all for your help. Below, I’ve summarized my situation and outlined my next steps.

Goal

I aim to transition from running my simulation on a 6-core CPU on my PC to an HPC system with 128-core nodes, targeting at least a 10x speed-up (using 20+ times more cores). However, so far, I’ve only achieved a 1x to 4x speed-up.

Case Details

I’m running a steady-state Conjugate Heat Transfer (CHT) case with radiation, involving one fluid and one solid participant. The coupling is handled using parallel-implicit mode with the same preCICE configuration as in the heat-exchanger tutorial.

For radiation modeling, I have two options:

  • fvDOM in OpenFOAM: After 4–5 timesteps, coupling iterations per timestep drop to 1 (almost like explicit coupling).
  • Cavity radiation in CCX: Requires ~10 coupling iterations per timestep but provides more reliable results.

Performance on HPC

  • On my PC (6 cores), the case runs successfully.
  • On HPC (128-core nodes), I expected a 10x speed-up when using 1–2 nodes (20–40x more cores).
  • However, results show:
    • fvDOM in OpenFOAM: ~4x speed-up.
    • Cavity radiation in CCX: <2x speed-up, despite a 20x increase in core count.

From OpenFOAM’s executionTime output, I see that OpenFOAM scales well (tested up to 100 cores). However, the overall simulation time does not decrease significantly, suggesting an issue with CCX or coupling.

My assuptions

  • If CCX is correctly configured (with Spooles, Pardiso, or PaStiX and a proper Slurm script), it should scale reasonably well up to ~100 cores in a single node using OpenMP, rather than just 4–8 cores.
  • If this is true, the issue could be:
    1. A bad Slurm script
    2. The need to switch solvers (from Spooles to Pardiso/PaStiX)

Next Steps

  1. Enable deeper profiling via adding lines toprecice-config.xml as @fsimonis suggested, to track communication and CCX execution time.
  2. Fix the Slurm script: Run CCX on a single node and OpenFOAM on another, avoiding synchronization issues. (hopefully I can fix the problem where simulation stuck at participants waiting for each other)
  3. Install PaStiX (Spack installation available).
  4. Install Pardiso.
  5. Test different CPU allocations and solvers (Spooles, Pardiso, PaStiX) on the HPC and compare performance results.

I have limited experience with HPC installations, Slurm scripts, and hostfiles, and I also have other responsibilities, so progress might be slow. However, I will share my findings here as I move forward.

Meanwhile, if anyone with experience in CCX on HPC has additional insight to share, I would greatly appreciate it.

Kind regards,
Umut