Debugging an MPI Application
The message passing interface (MPI) is a standard means of exchanging messages between multiple computers running a parallel program. Currently, the MPI 1.1 standard is commonly used. Open MPI must be installed to implement the MPI standard.
- A communicator defines a group of processes that can message each other. Each process is assigned a unique number called rank so that the processes explicitly communicate with each other by specifying the rank.
- During point-to-point communication, a process can send messages to another process by specifying the rank of the receiving process and a unique tag. The receiving process can send a request to receive the message with the unique tag, and then process the received data in sequence. To put it simply, the specific communication between a transmitter and a receiver is called point-to-point communication.
- Collective communication involves a group or groups of processes. For example, a process may need to broadcast a message to all other processes. MPI provides a dedicated interface to implement collective communication between all processes.
Prerequisites
Debugging an MPI Application Written in C/C++
- Click
in the shortcut menu area on the left, or click Development and choose Debug under Compiler and Debugger. On the Debug page that is displayed, select Parallel HPC application for Type and set other parameters as required. See Figure 1.
You do not need to select OpenMP Application when debugging an MPI application.
Table 1 Parallel HPC application debugging parameters Parameter
Description
Configured Remote Server
Target server for debugging a parallel HPC application.
Linux User Name
Name of the Linux user who starts the MPI application.
NOTE:The root user account has the highest permission. To avoid unnecessary risks to the system, we strongly recommend you use a non-root account for the debugging.
Linux User Password
Password of the Linux user.
Remember password
If this option is selected, the Linux user password of the current remote server will be remembered.
SSH Port
SSH port number of the server where the MPI application is started.
Program
MPI application. Associated application paths can be automatically displayed for selection.
Grant the Linux user the read permission for the current MPI application and the read, write, and execute permissions for the directory where the application is located.NOTE:- The MPI application must be an executable file.
- If there is no source code information in the MPI application, the debugger performs debugging in assembly mode by default.
(Optional) Program Arguments
Arguments transferred to the application. If there are multiple arguments, separate them with spaces.
Grant the Linux user the read, write, and execute permissions for the directory where the application is located and the execute permission for the parent directory.
Program Source Code Path
Shared path for storing the source code and MPI application. Associated working directory of source code can be automatically displayed for selection.
- If a shared path has been configured for the MPI application, the source code and MPI application must be stored in the shared path.
- Grant the Linux user the read and execute permissions for the source code directory of the current MPI application and the execute permission for the parent directory.
(Optional) Environment Variables
Enter the environment variables required for running a parallel HPC application in any of the following ways:
- export PATH=$PATH:/path/to/mpi
- source /configure/mpi/path/file
- module load /mpi/modulefiles
Launch Type
The options are:
- mpirun command
- Donau Scheduler
- Slurm Scheduler
NOTE:mpirun is a utility used to start parallel MPI applications and provide functions such as communication and cleanup between processes.
Donau Scheduler is a Huawei-developed HPC cluster scheduler that provides job scheduling with high resource utilization and throughput for large clusters.
Slurm is an open-source, highly customizable, scalable, and high-performance cluster management system. It provides resource management and job scheduling functions and is widely used in high-performance computing and cluster computing fields, such as physics, chemistry, biology, and astronomy.
MPI Application Command
mpirun command and the corresponding command arguments. The number of ranks ranges from 1 to 2048.
Command to Run Donau Scheduler
Command to run Donau Scheduler and corresponding command arguments.
Command to Run Slurm Scheduler
srun command and corresponding command arguments.
(Optional) OpenMP Application
If this parameter is selected, you need to enter the number of OpenMP threads.
(Optional) OpenMP Threads
Number of OpenMP threads, which ranges from 1 to 1024.
(Optional) Deadlock Detection
If this parameter is selected, you need to specify the lock wait timeout.
(Optional) Lock Wait Timeout (s)
Amount of time a transaction waits to obtain a lock. The default value is 10. The value ranges from 10 to 60.
- After the configuration is complete, click Start. A message is displayed in the lower right corner, indicating that parallel HPC application debugging is starting. The tool checks whether the configuration is correct. If the configuration is incorrect, modify the configuration as prompted. If the configuration is correct, a dialog box is displayed in the lower right corner, indicating that the rank status is being read, as shown in Figure 2.
If parallel HPC application debugging fails to be started, rectify the fault by following instructions in Failed to Start a Parallel HPC Application Debugging Task.
- If the rank status fails to be read, download the latest log file as prompted to view the failure details. See Figure 3.
- If the rank status is successfully read, the MPI application debugging page automatically appears. The RUN AND DEBUG window, source code window, and debugging bar are displayed. The RUN AND DEBUG window consists of the debugging information and RANK INFO areas, as shown in Figure 4.
- Select All, Rank, or Communication Groups for debugging. You can also click other buttons on the debugging bar to perform debugging. See Table 2.
Table 2 Description of debugging buttons Button
Description

Runs the code until the next breakpoint.

Suspends the program that is being executed.

Executes the next line of code.

Steps in to the function.

Steps out of the function.

Restarts debugging.

Stops debugging.
- Debugging modes
- Debugging in All mode: debugs all ranks. The locating icon is displayed on the right of each communication group in the RANK INFO area.
- Debugging in Rank mode: debugs a single rank. The locating icon is displayed on the right of the target rank in the RANK INFO area.
- Debugging in Communication Groups mode: debugs the communication subgroup where the rank you select resides. The locating icon is displayed on the right of the communication subgroup hosting the target rank in the RANK INFO area.
- Rank status display
- The indicator before a rank indicates the rank status. A green one indicates that the rank is stopped, a red one indicates that the rank is running, and a gray one indicates that the rank has exited.
- Debugging modes
- The line of code being debugged is highlighted. You can click the code line number to set a breakpoint. You can right-click the breakpoint to edit, delete, or disable it.
You can add conditional breakpoints (expressions and hit counts). Conditional breakpoints can be modified, enabled, disabled, and deleted. An expression breakpoint indicates that the program is stopped when the expression is true. A hit count breakpoint indicates that the program is stopped when the specified number of hits is reached or exceeded.
Figure 5 Setting a breakpoint
- An expression can contain a maximum of 1024 characters.
- A hit count is a positive integer less than or equal to 2147483647 (231-1).
- Click
on the debugging bar to restart MPI application debugging. After the restart, a dialog box is displayed in the lower right corner, indicating that the rank status is being read. After the rank status is read successfully, the MPI application debugging page is displayed.Figure 6 Restarting a debugging task
- Click the RUN AND DEBUG window on the left to view the variables (Locals and Registers), WATCH, BREAKPOINTS, and CALL STACK information.
- During debugging, you can right-click a variable expression to reset the variable value or add the variable expression to the WATCH area. Register expressions cannot be added to the WATCH area.
- In the WATCH area, you can add, modify, delete some or all watched expressions. Only C or C++ expressions support this function.
- In the breakpoint area, you can delete a single breakpoint, and delete, enable, and disable all breakpoints.
See Figure 7.Figure 7 Debugging information
- You can click the CALL STACK area to display the stack information, including the function name, file name, number of running lines, and address.
- In the debugging information area on the left, click a stack to display the corresponding source code or assembly code in the code area.
- Click
in the RANK INFO area. The COMMUNICATION SUBGROUP CHANGE page is displayed on the VS Code panel.- Click the Change Overview tab. The communication subgroup change data is collected every 100 ms. The changes of communication subgroups are distinguished by diamonds in different colors. Blue indicates that a communication subgroup is created, purple indicates that a communication subgroup is cleared, and yellow indicates that there are communication subgroups created and cleared within 100 ms.
Figure 8 Communication subgroup change overview
Hover the mouse pointer over the diamond to see the detailed information about the communication subgroup change.
Figure 9 Pop-up for a communication subgroup
- Click the Change Details tab to see the change details of communication subgroups. On the page that is displayed, move the mouse pointer to view detailed information such as the belonging communication subgroup and rank information.
Figure 10 Communication group change details
- If a deadlock is detected during debugging, a message will be displayed. You can click View Details or the Deadlocks tab to view the deadlock details, including the deadlock status diagram and a table. The table displays the rank, source process (rank), target process (rank), tag, data size (byte), and call stack information.
Figure 11 Communication subgroup deadlock details
If you do not click
, a dialog box will appear in the lower right corner when a deadlock occurs during debugging. See Figure 12. Click View Details to go to the deadlock page and view details about the communication subgroup deadlock.
In the RANK INFO area, click
to enable the function of collecting data about creating and clearing communication subgroups and of displaying the change overview on the VS Code panel.In the Communication Subgroup Change area, you can click Communication subgroup created, Communication subgroups cleared, or Communication subgroups created and cleared to hide corresponding information.
A deadlock occurs when two or more processes are waiting for another process to release resources, or when they are waiting for resources in a loop chain. During point-to-point communication in MPI applications, deadlock might occur if the invoking sequence of MPI point-to-point functions is improper.
- Click the Change Overview tab. The communication subgroup change data is collected every 100 ms. The changes of communication subgroups are distinguished by diamonds in different colors. Blue indicates that a communication subgroup is created, purple indicates that a communication subgroup is cleared, and yellow indicates that there are communication subgroups created and cleared within 100 ms.
Debugging an MPI Application Written in Fortran
Fortran is an advanced, widely-used language that is applicable to scientific computing. It is a compiled language. Source code in Fortran must be compiled using a compiler to generate executable files. For details about how to install the GFortran compiler, visit https://gcc.gnu.org/fortran/.
Fortran 90 is a block-based program that consists of several program modules. Each module has a similar statement organization form. The main program controls the entire program, and each auxiliary program module implements an algorithm in a problem. Fortran 95 is a supplement to Fortran 90. Both of them support debugging.
Due to the syntax characteristic of the BiSheng Compiler and Fortran, when Fortran applications compiled by the BiSheng Compiler are being debugged, the assembly is started, breakpoints cannot be set, and debugging information cannot be viewed.
- Click
in the shortcut menu area on the left, or click Development and choose Debug under Compiler and Debugger. On the Debug page that is displayed, select Parallel HPC application for Type and set other parameters as required. See Figure 13.
You do not need to select OpenMP Application when debugging an MPI application.
.f90 is the standard file name extension for modern Fortran source files. 90 refers to the first modern Fortran standard in 1990.
Table 3 Parallel HPC application debugging parameters Parameter
Description
Configured Remote Server
Target server for debugging a parallel HPC application.
Linux User Name
Name of the Linux user who starts the MPI application.
NOTE:The root user account has the highest permission. To avoid unnecessary risks to the system, we strongly recommend you use a non-root account for the debugging.
Linux User Password
Password of the Linux user.
Remember password
If this option is selected, the Linux user password of the current remote server will be remembered.
SSH Port
SSH port number of the server where the MPI application is started.
Program
MPI application. Associated application paths can be automatically displayed for selection.
Grant the Linux user the read permission for the current MPI application and the read, write, and execute permissions for the directory where the application is located.NOTE:- The MPI application must be an executable file.
- If there is no source code information in the MPI application, the debugger performs debugging in assembly mode by default.
(Optional) Program Arguments
Arguments transferred to the application. If there are multiple arguments, separate them with spaces.
Grant the Linux user the read, write, and execute permissions for the directory where the application is located and the execute permission for the parent directory.
Program Source Code Path
Shared path for storing the source code and MPI application. Associated paths can be automatically displayed for selection.
- If a shared path has been configured for the MPI application, the source code and MPI application must be stored in the shared path.
- Grant the Linux user the read and execute permissions for the source code directory of the current MPI application and the execute permission for the parent directory.
(Optional) Environment Variables
Enter the environment variables required for running a parallel HPC application in any of the following ways:
- export PATH=$PATH:/path/to/mpi
- source /configure/mpi/path/file
- module load/mpi/modulefiles
Launch Type
The options are:
- mpirun command
- Donau Scheduler
- Slurm Scheduler
NOTE:mpirun is a utility used to start parallel MPI applications and provide functions such as communication and cleanup between processes.
Donau Scheduler is a Huawei-developed HPC cluster scheduler that provides job scheduling with high resource utilization and throughput for large clusters.
Slurm is an open-source, highly customizable, scalable, and high-performance cluster management system. It provides resource management and job scheduling functions and is widely used in high-performance computing and cluster computing fields, such as physics, chemistry, biology, and astronomy.
MPI Application Command
mpirun command and the corresponding command arguments. The number of ranks ranges from 1 to 2048.
Command to Run Donau Scheduler
Command to run Donau Scheduler and corresponding command arguments.
Command to Run Slurm Scheduler
Command to run Slurm Scheduler and corresponding command arguments.
(Optional) OpenMP Application
If this parameter is selected, you need to enter the number of OpenMP threads.
(Optional) OpenMP Threads
Number of OpenMP application threads.
(Optional) Deadlock Detection
If this parameter is selected, you need to specify the lock wait timeout.
(Optional) Lock Wait Timeout (s)
Amount of time a transaction waits to obtain a lock. The default value is 10. The value ranges from 10 to 60.
- After the configuration is complete, click Start. A message is displayed in the lower right corner, indicating that parallel HPC application debugging is starting. The tool checks whether the configuration is correct. If the configuration is incorrect, modify the configuration as prompted. If the configuration is correct, a dialog box is displayed in the lower right corner, indicating that the rank status is being read, as shown in Figure 14.
If parallel HPC application debugging fails to be started, rectify the fault by following instructions in Failed to Start a Parallel HPC Application Debugging Task.
- If the rank status fails to be read, download the latest log file as prompted to view the failure details. See Figure 15.
- If the rank status is successfully read, the MPI application debugging page automatically appears. The RUN AND DEBUG window, source code window, and debugging bar are displayed. The RUN AND DEBUG window consists of the debugging information and RANK INFO areas, as shown in Figure 16.
- Select All, Rank, or Communication Groups for debugging. You can also click other buttons on the debugging bar to perform debugging. See Table 4.
Table 4 Description of debugging buttons Button
Description

Runs the code until the next breakpoint.

Suspends the program that is being executed.

Executes the next line of code.

Steps in to the function.

Steps out of the function.

Restarts debugging.

Stops debugging.
- Debugging modes
- Debugging in All mode: debugs all ranks. The locating icon is displayed on the right of each communication group in the RANK INFO area.
- Debugging in Rank mode: debugs a single rank. The locating icon is displayed on the right of the target rank in the RANK INFO area.
- Debugging in Communication Groups mode: debugs the communication subgroup where the rank you select resides. The locating icon is displayed on the right of the communication subgroup hosting the target rank in the RANK INFO area.
- Rank status display
- The indicator before a rank indicates the rank status. A green one indicates that the rank is stopped, a red one indicates that the rank is running, and a gray one indicates that the rank has exited.
- Debugging modes
- The line of code being debugged is highlighted. You can click the code line number to set a breakpoint. You can right-click the breakpoint to edit, delete, or disable it.
You can add conditional breakpoints (expressions and hit counts). Conditional breakpoints can be modified, enabled, disabled, and deleted. An expression breakpoint indicates that the program is stopped when the expression is true. A hit count breakpoint indicates that the program is stopped when the specified number of hits is reached or exceeded.
Figure 17 Setting a breakpoint
- An expression can contain a maximum of 1024 characters.
- A hit count is a positive integer less than or equal to 2147483647 (231-1).
- Click
to restart the compilation task. In the dialog box that is displayed, click Restart. The page for reading the rank status is displayed. After the rank status is successfully read, the MPI application debugging page is displayed.Figure 18 Restarting a task
- Click the RUN AND DEBUG window on the left to view the variables (Locals and Registers), WATCH, BREAKPOINTS, and CALL STACK information.
- During debugging, you can right-click a variable expression to reset the variable value or add the variable expression to the WATCH area. Register expressions cannot be added to the WATCH area.
- In the WATCH area, you can add, modify, delete some or all watched expressions. Only C or C++ expressions support this function.
- In the breakpoint area, you can delete a single breakpoint, and delete, enable, and disable all breakpoints.
See Figure 19.Figure 19 Debugging information
- You can click the CALL STACK area to display the stack information, including the function name, file name, number of running lines, and address.
- In the debugging information area on the left, click a stack to display the corresponding source code or assembly code in the code area.
- Click
in the RANK INFO area. The COMMUNICATION SUBGROUP CHANGE page is displayed on the VS Code panel.- Click the Change Overview tab. The communication subgroup change data is collected every 100 ms. The changes of communication subgroups are distinguished by diamonds in different colors. Blue indicates that a communication subgroup is created, purple indicates that a communication subgroup is cleared, and yellow indicates that there are communication subgroups created and cleared within 100 ms.
Figure 20 Communication subgroup change overview
Hover the mouse pointer over the diamond to see the detailed information about the communication subgroup change.
Figure 21 Pop-up for a communication subgroup
- Click the Change Overview tab. The communication subgroup change data is collected every 100 ms. The changes of communication subgroups are distinguished by diamonds in different colors. Blue indicates that a communication subgroup is created, purple indicates that a communication subgroup is cleared, and yellow indicates that there are communication subgroups created and cleared within 100 ms.
- Click Change Details. On the page that is displayed, move the mouse pointer to view detailed information such as the belonging communication subgroup and rank information.Figure 22 Communication subgroup change details
Click
in the RANK INFO area. The information about creating and clearing communication subgroups is displayed in the lower part of the page.In the Communication Subgroup Change area, you can click Communication subgroup created, Communication subgroups cleared, or Communication subgroups created and cleared to hide corresponding information.








