HPC Application Exits Unexpectedly
Symptom
When the HPC application is running, execute the MPI_Init function and then the debugging command. The error message "All ranks has exited" is displayed even though the application running has not completed.
Find the debugger/logs/debugger/debugger.log file in the tool installation path and search the file for "Mpirun temp log file" to determine the path to the debugging program startup log.
Possible Causes
As the number of ranks increases, the initialization and communication load between ranks at the network layer increases dramatically. When the target program is started using the LLDB-Server, the MPI module performance deteriorates. The heavy load interrupts data transmission and then Open MPI initialization fails.
Troubleshooting Procedure
When starting a debugging task, specify the -e parameter to add the environment variable. For example:
-e "export PMIX_MCA_gds=^ds21"