Rate This Document
Findability
Accuracy
Completeness
Readability

HPC Application Exits Unexpectedly

Symptom

When the HPC application is running, execute the MPI_Init function and then the debugging command. The error message "All ranks has exited" is displayed even though the application running has not completed.

Figure 1 Error message
Figure 2 Log information

Find the debugger/logs/debugger/debugger.log file in the tool installation path and search the file for "Mpirun temp log file" to determine the path to the debugging program startup log.

Possible Causes

As the number of ranks increases, the initialization and communication load between ranks at the network layer increases dramatically. When the target program is started using the LLDB-Server, the MPI module performance deteriorates. The heavy load interrupts data transmission and then Open MPI initialization fails.

Troubleshooting Procedure

When starting a debugging task, specify the -e parameter to add the environment variable. For example:

-e "export PMIX_MCA_gds=^ds21"