Coupling Matlab-Matlab through precice via Slurm

Hello,

I am trying to couple two matlab instances with precice in a single slurm run on a server.
I was able to launch both instances and they both successfully setup their primary communication, but fail on the secondary communication.
The error message in both instances is

`Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffff00000009)`

I try to set up the slurm session with one node and two tasks, such that each instance has one tasks to run on. I do the following:

several Sbatch commands... 
#allocation of nodes and tasks
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
...
cd $basedir && srun --quiet -n1 -c1  ./PDEPE_ser.sh &
cd $basedir && srun --quiet -n1 -c1  ./ODE_ser.sh &
wait

The files PDEPE_ser.sh and ODE_ser.sh are helper scripts to launch the individual matlab sessions by launching ODE.m and PDEPE.m.
Both matlab sessions (ODE.m and PDEPE.m) contain the following line to setup their participation

% in file PDEPE.m
interface = precice.Participant("PDEPE", "precice-config.xml", 0, 1);
meshName ="PDEPE-Mesh";
dims = interface.getMeshDimensions(meshName);
vertexIDs = interface.setMeshVertices(meshName, repmat(1:numVertices,2,1));
% Data IDs
dataNameU = "Head";
dataNameW = "Leakage";
dataNameL = "LowerHead";
interface.initialize();
% in file ODE.m
interface = precice.Participant("ODE", "precice-config.xml", 0, 1);
meshName ="ODE-Mesh";
...  % see above
interface.initialize();

I am not quite sure if the error message Caught signal 11 (Segmentation fault: means that both precice runs cannot find each other, or if it has to do with the matlab-bindings not being able to run in parallel (Support parallel runs (intra-solver parallelism) · Issue #3 · precice/matlab-bindings · GitHub).

        function obj = Participant(ParticipantName,configFileName,ProcessIndex,ProcessSize)
            %PARTICIPANT Construct an instance of this class
            if (ProcessIndex > 0 || ProcessSize > 1)
                error('Parallel runs are currently not supported with the MATLAB bindings.')
            end

I attached my precice-config.xml for reference.

Best wishes and thank you for your answer,
Jeremie

precice-config.xml (1.8 KB)

I eventually found the mistake.

It had to do with how I set LD_PRELOAD prior to launching the matlab instances.
I fixed it by setting the following:

export LD_PRELOAD="/path/to/libstdc++.so.6 \
/path/to/libgcc_s.so.1 \
/path/to/boost/1.85.0-3bnyjad/lib/libboost_log.so.1.85.0 \
/path/to/boost/1.85.0-3bnyjad/lib/libboost_system.so.1.85.0 \
/path/to/libxml2/2.10.3-d64hec2/lib/libxml2.so.2 \
/path/to/openmpi/4.1.3-sikgzla/lib/libmpi.so.40 \
/path/to/petsc/3.14.6-dxzgmfh/lib/libpetsc.so.3.14"

Note that /path/to has to be changed according to the locations of the files.

3 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.