Excessive Number of Job Processes

Symptom

The mpirun command fails to be executed because the number of processes that submit MPI jobs is greater than the total number of CPU cores on job execution nodes in the cluster.

The following is an example of the execution failure:

$ mpirun -np 1025 --hostfile hf8 hmpifile_2021/allreduce/AllReduce
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 1025
slots that were requested by the application:
 
  hmpifile_2021/allreduce/AllReduce
 
Either request fewer slots for your application, or make more slots
available for use.
 
A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:
 
  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, Open MPI defaults to the number of processor cores
 
In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
 
Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------

1025 indicates the number of processes that submit MPI jobs.

Possible Causes

During mpirun execution, the number of processes is greater than the total number of CPU cores on the job execution nodes in the cluster.

Procedure

Use PuTTY to log in to a job execution node as a common Hyper MPI user, for example, hmpi_user.

Run the following command to query the number of CPU cores on each job execution node:

lscpu

Architecture:          aarch64
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    1
Core(s) per socket:    64
Socket(s):             2
NUMA node(s):          4
Model:                 0
CPU max MHz:           2600.0000
CPU min MHz:           200.0000
BogoMIPS:              200.00
L1d cache:             64K
L1i cache:             64K
L2 cache:              512K
L3 cache:              65536K
NUMA node0 CPU(s):     0-31
NUMA node1 CPU(s):     32-63
NUMA node2 CPU(s):     64-95
NUMA node3 CPU(s):     96-127
Flags:                 fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop

Calculate the total number of CPU cores on the job execution nodes in the cluster and ensure that the number of processes is less than or equal to the total number of CPU cores during mpirun execution. Assume that there are eight nodes with 1,024 CPU cores. Run the following command to submit an MPI job:
mpirun -np 1024 --hostfile hf8 hmpifile_2021/allreduce/AllReduce
```
All tests are success
```

Parent topic: FAQ