Sample 1: Parallel Debugging of MPI Applications

This sample demonstrates how to use the Compiler and Debugger to debug MPI applications.

Obtain the MPI program source file bcast_demo.c from GitHub.

The downloaded source package is devkitdemo-devkitdemo-23.0.1.zip. After decompression, bcast_demo.c in the Compiler_and_Debugger/mpi_demo/ directory is used as the MPI program source file. During compilation, add -g to the end of the mpicc compile command to generate an executable file with debugging information.

         
              mpicc -g bcast_demo.c -o bcast_demo

Set the path to the bcast_demo executable file as the program path in 3, and the executable file path as the program source code path in 3.

In the resource manager of VS Code, open the local decompressed folder (devkitdemo-devkitdemo-23.0.1/Compiler_and_Debugger/mpi_demo). Access the Kunpeng DevKit, click the Development tab, and then click Debug in the Compiler and Debugger area to open the debugging page.
Figure 1 Selecting a debugging type

Select Parallel HPC application for Type and set parameters for debugging an MPI application as required. Table 1 describes the parameters.

Figure 2 Setting parameters for MPI application debugging

**Table 1** Parallel HPC application debugging parameters
Parameter	Description
Configured Remote Server	Target server for debugging a parallel HPC application.
Linux User Name	Name of the Linux user who starts the MPI application. NOTE: The root user account has the highest permission. To avoid unnecessary risks to the system, we strongly recommend you use a non-root account for the debugging.
Linux User Password	Password of the Linux user.
SSH Port	SSH port number of the server where the MPI application is started.
Application	MPI application. Associated application paths can be automatically displayed for selection. Grant the Linux user the read permission for the current MPI application and the read, write, and execute permissions for the directory where the application is located. NOTE: The MPI application must be an executable file. If there is no source code information in the MPI application, the debugger performs debugging in assembly mode by default.
(Optional) Application Arguments	Arguments transferred to the application. If there are multiple arguments, separate them with spaces. Grant the Linux user the read, write, and execute permissions for the directory where the application is located and the execute permission for the parent directory.
Application Source Code Path	Shared path for storing the source code and MPI application. Associated working directory of source code can be automatically displayed for selection. If a shared path has been configured for the MPI application, the source code and MPI application must be stored in the shared path. Grant the Linux user the read and execute permissions for the source code directory of the current MPI application and the execute permission for the parent directory.
(Optional) Environment Variables	Enter the environment variables required for running a parallel HPC application in any of the following ways: export PATH=$PATH:/path/to/mpi source /configure/mpi/path/file module load /mpi/modulefiles
Launch Type	Debugging launch type, which can be: mpirun command Donau Scheduler Slurm Scheduler
MPI Application Command	mpirun command and the corresponding command arguments. The number of ranks ranges from 1 to 2,048.
Command to Run Donau Scheduler	Command to run Donau Scheduler and corresponding command arguments.
Command to Run Slurm Scheduler	srun command and corresponding command arguments.
(Optional) OpenMP Application	If this parameter is selected, you need to enter the number of OpenMP threads.
(Optional) OpenMP Threads	Number of OpenMP threads, which ranges from 1 to 1,024.
(Optional) Deadlock Detection	If this parameter is selected, you need to specify the lock wait timeout.
(Optional) Lock Wait Timeout (s)	Amount of time a transaction waits to obtain a lock. The default value is 10. The value ranges from 10 to 60.

Click Debug. If a message about permission issues is displayed in the lower right corner, as shown in Figure 3, run the following command:
Figure 3 Permission message
```
chmod 700 -R directory_name/
```
Click Debug again to start debugging the parallel HPC application and read the rank status.
Figure 4 Starting parallel HPC application debugging

Figure 5 Reading the rank status
If the rank status is successfully read, the MPI application debugging page automatically appears. The RUN AND DEBUG window, source code window, and debugging bar are displayed. The RUN AND DEBUG window consists of the debugging information and RANK INFO areas, as shown in Figure 6.
Figure 6 Rank status read successfully

Select the debug granularity.

Figure 7 Selecting the debug granularity

**Table 2** Debug granularity
Debug Granularity	Description
All	Debugs all ranks.
rank	Debugs a single rank.
Communication Groups	Debugs the communication group hosting the rank you select.

Click any button in the debugging bar to debug the MPI application.

**Table 3** Description of debugging buttons
Icon	Operation	Description
	Resume	Runs the code until the next breakpoint.
	Suspend	Suspends the program that is being executed.
	Skip a single step	Executes the next line of code.
	Step in	Steps in to the function.
	Step out	Steps out of the function.
	Restart	Restarts debugging.
	Stop	Stops debugging.

Select All in the RANK INFO area, and add breakpoints at lines 89, 47, and 93.
You can add conditional breakpoints (expressions and hit counts). Conditional breakpoints can be modified, enabled, disabled, and deleted. An expression breakpoint indicates that the program is stopped when the expression is true. A hit count breakpoint indicates that the program is stopped when the specified number of hits is reached or exceeded.
Select All and click . When the code execution reaches line 89, click to execute the MPI_Comm_split(MPI_COMM_WORLD, color, rankNum, &row_comm) function that groups ranks. In this example, four ranks are grouped into two communication subgroups. rank0 and rank1 are in communication subgroup 1, and rank2 and rank3 in communication subgroup 2.
Figure 8 Two communication subgroups generated
The MPI_Comm_split(MPI_COMM_WORLD, color, rankNum, &row_comm) function is used to create new communication subgroups.
1. MPI_COMM_WORLD indicates the original communication group. The original communication group does not go away, but a new communication group is created on each process.
2. color specifies the new communication subgroup to which a rank belongs.
3. rankNum specifies the ordering (rank) in each new communication group. The process which passes in the smallest value for rankNum will be rank0, the next smallest will be rank1, and so on.
4. row_comm indicates how MPI returns the new communication subgroup to a user.
After two communication subgroups are generated, the source code of all ranks is executed to line 90. Select Communication Groups, select rank0 in communication subgroup 1, and click . The rank code in communication subgroup 1 is executed to line 92, and the rank code in communication subgroup 2 remains unchanged and stays at line 90.
Figure 9 Debugging communication subgroup 1

Figure 10 Communication subgroup 2 not debugged
Select the Communication Groups debug granularity, select rank0 in communication subgroup 1 on the left, and click to continue. When the code execution reaches line 47, click to execute the MPI_Barrier(MPI_COMM_WORLD) function. After the function is executed, communication subgroup 1 remains in the waiting state. Select rank2 in communication subgroup 2 on the left and click . The code is executed to line 47. Click to execute the MPI_Barrier(MPI_COMM_WORLD) function. After the function is executed, communication subgroup 1 exits the waiting state and the code is executed to line 49.
Figure 11 Executing the barrier function
- MPI_Barrier(MPI_COMM_WORLD) is a barrier function. MPI_COMM_WORLD specifies the communication group in which the barrier takes place. The function synchronizes all ranks in a communication group. When the function is called, the ranks are in the waiting state. The function can be executed only after all ranks in the communication subgroup call the function.
- When communication subgroup 1 executes the MPI_Barrier function, communication subgroup 2 does not synchronously execute the function because the Communication Groups debugging mode is used. After executing the MPI_Barrier function, communication subgroup 1 remains in the waiting state until communication subgroup 2 executes the MPI_Barrier function.
Click . View the communication subgroup change in the VS Code panel. The communication subgroup change statistics are collected every 100 ms. The changes of communication subgroups are distinguished by diamonds in different colors. Blue indicates that a communication subgroup is created, purple indicates that a communication subgroup is cleared, and yellow indicates that there are communication subgroups created and cleared within 100 ms.
Figure 12 Communication subgroup change overview

If a deadlock is detected during debugging, a message will be displayed on the COMMUNICATION SUBGROUP CHANGE panel. You can click View Details or the Deadlocks tab to view the deadlock details, including the deadlock status diagram and a table. The table displays the rank, source process (rank), target process (rank), tag, data size (byte), and call stack information.
Delete the breakpoint at line 47. Select All in the RANK INFO area. Click . The code execution reaches line 93. Click to execute the MPI_Comm_free(&row_comm) function to release the created communication subgroups (if required).
Figure 13 Releasing a communication subgroup
View the debugging information in the debugging area on the left.
Figure 14 Viewing debugging information

Parent topic: Compiler and Debugger