Tuning the hybrid running
Principle
When HPC applications run in large-scale scenarios and the number of processes reaches a certain level, the communication overhead between MPIs becomes very large, affecting the overall performance. In this scenario, the hybrid mode (process + thread combination) can be used to reduce the number of MPI processes, thereby decreasing the MPI communication overhead between processes and improving the overall running efficiency and performance.
Procedure
- Add -x ppr:x:socket:pe=y to the end of the mpirun command to set the number of threads and set the thread binding relationship.
An example is as follows:
mpirun --map-by ppr:24:socket:pe=2 ./a.out
Each socket has 24 processes, and each process has two threads. They are polled in sequence. "pe=2" is the key. In addition, socket can be changed to values such as numa and node.
Parent topic: Basic Software Tuning