Sample 2: Parallel Debugging of Hybrid MPI/OpenMP Applications
This sample demonstrates how to use the Compiler and Debugger to debug hybrid MPI/OpenMP applications.
- Obtain the MPI program source file mpi_openmp_demo.c from GitHub.
The downloaded source package is devkitdemo-devkitdemo-23.0.1.zip. After decompression, mpi_openmp_demo.c in the Compiler_and_Debugger/mpi_demo/ directory is used as the hybrid MPI/OpenMP program source file. During compilation, add -g and -fopenmp to the end of the mpicc compile command to generate an executable file with debugging information.
1mpicc -g -fopenmp mpi_openmp_demo.c -o mpi_openmp_demo
Set the path to the mpi_openmp_demo executable file as the program path in 3, and the executable file path as the program source code path in 3.
- In the resource manager of VS Code, open the local decompressed folder (devkitdemo-devkitdemo-23.0.1/Compiler_and_Debugger/mpi_demo). Access the Kunpeng DevKit, click the Development tab, and then click Debug in the Compiler and Debugger area to open the debugging page.Figure 1 Selecting a debugging type
- Select Parallel HPC application for Type and set parameters for debugging the hybrid MPI/OpenMP application as required. Table 1 describes the parameters.Figure 2 Setting parameters for hybrid MPI/OpenMP application debugging
Table 1 Parallel HPC application debugging parameters Parameter
Description
Configured Remote Server
Target server for debugging a parallel HPC application.
Linux User Name
Name of the Linux user who starts the hybrid MPI/OpenMP application.
NOTE:The root user account has the highest permission. To avoid unnecessary risks to the system, we strongly recommend you use a non-root account for the debugging.
Linux User Password
Password of the Linux user.
SSH Port
SSH port number of the server where the hybrid MPI/OpenMP application is started.
Application
MPI/OpenMP application. Associated application paths can be automatically displayed for selection.
Grant the Linux user the read permission for the current hybrid MPI/OpenMP application and the read, write, and execute permissions for the directory where the application is located.NOTE:- The hybrid MPI/OpenMP application must be an executable file.
- If there is no source code information in the hybrid MPI/OpenMP application, the debugger performs debugging in assembly mode by default.
(Optional) Application Arguments
Arguments transferred to the application. If there are multiple arguments, separate them with spaces.
Grant the Linux user the read, write, and execute permissions for the directory where the application is located and the execute permission for the parent directory.
Application Source Code Path
Shared path for storing the source code and hybrid MPI/OpenMP application. Associated working directory of source code can be automatically displayed for selection.
- If a shared path has been configured for the hybrid MPI/OpenMP application, the source code and hybrid MPI/OpenMP application must be stored in the shared path.
- Grant the Linux user the read and execute permissions for the source code directory of the current MPI application and the execute permission for the parent directory.
(Optional) Environment Variables
Enter the environment variables required for running a parallel HPC application in any of the following ways:
- export PATH=$PATH:/path/to/mpi
- source /configure/mpi/path/file
- module load /mpi/modulefiles
Launch Type
Debugging launch type, which can be:
mpirun commandDonau Scheduler Slurm Scheduler
MPI Application Command
mpirun command and the corresponding command arguments. The number of ranks ranges from 1 to 2,048.
Command to Run Donau Scheduler
Command to run Donau Scheduler and corresponding command arguments.
Command to Run Slurm Scheduler
srun command and corresponding command arguments.
OpenMP Application
If this parameter is selected, you need to enter the number of OpenMP threads.
OpenMP Threads
Number of OpenMP threads, which ranges from 1 to 1,024.
(Optional) Deadlock Detection
If this parameter is selected, you need to specify the lock wait timeout.
(Optional) Lock Wait Timeout (s)
Amount of time a transaction waits to obtain a lock. The default value is 10. The value ranges from 10 to 60.
- Click Debug. If a message about permission issues is displayed in the lower right corner, as shown in Figure 3, run the following command:
chmod 700 -R directory_name/
- Click Debug again to start debugging the parallel HPC application and read the rank status.Figure 4 Starting parallel HPC application debugging
Figure 5 Reading the rank status
- If the rank status is successfully read, the hybrid MPI/OpenMP application debugging page automatically appears. The RUN AND DEBUG window, source code window, and debugging bar are displayed. The RUN AND DEBUG window consists of the debugging information and RANK INFO areas, as shown in Figure 6.
- Select the debug granularity.Figure 7 Selecting the debug granularity
Table 2 Debug granularity Debug Granularity
Description
All
Debugs all ranks.
rank
Debugs a single rank.
Communication Groups
Debugs the communication group hosting the rank you select.
- Click any button in the debugging bar to debug the hybrid MPI/OpenMP application.
Table 3 Description of debugging buttons Icon
Operation
Description

Resume
Runs the code until the next breakpoint.

Suspend
Suspends the program that is being executed.

Skip a single step
Executes the next line of code.

Step in
Steps in to the function.

Step out
Steps out of the function.

Restart
Restarts debugging.

Stop
Stops debugging.
If you select a rank and click a button that has "Process-level/Thread-level" in its description, the debugging operation applies to the process or rank. If you select a thread and click such a button, the debugging operation applies to the specific thread.
- Select All in the RANK INFO area, and add breakpoints at lines 56, 32, and 36.
You can add conditional breakpoints (expressions and hit counts). Conditional breakpoints can be modified, enabled, disabled, and deleted. An expression breakpoint indicates that the program is stopped when the expression is true. A hit count breakpoint indicates that the program is stopped when the specified number of hits is reached or exceeded.
- Select All and click
. When the code execution reaches line 56, click
to execute the MPI_Comm_split(MPI_COMM_WORLD, color, rankNum, &row_comm) function that groups ranks. In this example, four ranks are grouped into four communication subgroups.Figure 8 Four communication subgroups generated
The MPI_Comm_split(MPI_COMM_WORLD, color, rankNum, &row_comm) function is used to create new communication subgroups.
- MPI_COMM_WORLD indicates the original communication group. The original communication group does not go away, but a new communication group is created on each process.
- color specifies the new communication subgroup to which a rank belongs.
- rankNum specifies the ordering (rank) in each new communication group. The process which passes in the smallest value for rankNum will be rank0, the next smallest will be rank1, and so on.
- row_comm indicates how MPI returns the new communication subgroup to a user.
- After four communication subgroups are generated, the source code of all ranks is executed to line 59. Click the rank in communication subgroup 1, three threads are displayed: thread1, thread2, and thread3.Figure 9 Viewing threads
thread1 is the main thread of the MPI application. thread2 and thread3 are auxiliary threads, for which only the assembly code is displayed instead of the source code.
- Select a rank and click
. The code is executed to line 32. Click
to execute the code to the line of OpenMP directives. Set a breakpoint in the first line after the OpenMP directives and click
to execute the OpenMP statements to generate subthreads. See Figure 10. If you click
, the code block of OpenMP directives is executed at a time.
- The number of threads defined previously is 4. After the for statement is executed, subthreads are generated and there are six threads in total in the rank. thread1 is the main thread, thread2 and thread3 are MPI auxiliary threads, and thread4, thread5, and thread6 are OpenMP auxiliary threads. thread1, thread4, thread5, and thread6 are the four threads defined. The auxiliary threads of OpenMP execute only the code block of OpenMP directives, and the stack stays at the location of main._omp_fn().
- The main thread executes to the code following the block of OpenMP directives, that is, line 40 in this example. After the block of OpenMP directives is executed, the auxiliary threads turn to the residual state.
- Currently, thread1, thread4, thread5, and thread6 have breakpoints set in line 36. Select thread4 and click
to perform thread-level debugging. In this example, thread4 executes the Test(i) function. Check the variable value in the debugging information of thread4. The value of i changes from 2500 to 2501, and the values of i of the other three threads remain unchanged. Select a rank and click
. All threads in the rank continue to run. If one thread stops running, other threads also stop. Therefore, the execution positions of the threads are different, and the value of variable i may not increase by 1 for all threads.- Threads run asynchronously, and thread-level breakpoints set for the threads are independent from each other.
- Debugging at the rank level will synchronize the operations to all threads in the rank. Debugging on a thread affects only the selected thread. (The debugging scope of synchronization to the rank or threads varies according to the debugging granularity.)
- During thread-level debugging, only one thread is running in a rank.
- If you switch a thread during thread-level debugging, the current thread is suspended and then switched to another thread.
Figure 11 Thread-level debugging
Figure 12 Rank-level debugging
- The breakpoint set when a thread is selected for debugging works only for the thread, and the breakpoint set when a rank is selected for debugging works for the whole rank.
- Thread-level breakpoints are independent from each other.
- A rank-level breakpoint is displayed for all threads in the rank. Thread-level breakpoints are displayed under their own threads.
- If you click a rank, the rank-level breakpoint and the breakpoints of threads under the rank are displayed.
- Add a breakpoint at line 26. In the code, the OpenMP directive "#pragma omp parallel for reduction(+:sum)" indicates that multiple threads execute the for loop concurrently. In this example, four threads equally divide the 10,000 loops, and each thread can save their own sum value. After the code block of OpenMP directives is executed, the main thread summarizes the sum values calculated by each thread.Figure 13 sum += i for thread4
- A reduction-identifier is any of the following operators: +, -, *, &, |, ^, &&, and ||.
- Before debugging thread4, the value of sum is 5001 and the value of i is 2502. Execute the Test(i) function and switch to line 26 to execute the for loop. After the execution is complete, click
to step out of the Test(i) function and execute sum += i;. The value of sum is 7503. Repeat the same operations on the other three threads each for 2500 times.
- After the loop ends, the four threads accumulate their sums as the final output. The main thread outputs the final result, and the auxiliary threads display the assembly code.Figure 14 Output of the main thread
Figure 15 Assembly code in auxiliary threads
- Add a breakpoint at line 70 and click
to execute the MPI_Finalize() function to release MPI auxiliary threads thread2 and thread3. See Figure 16.
The MPI_Finalize() function terminates the execution of an MPI process. Executing this function clears the status related to the MPI application.
- Click
. View the communication subgroup change in the VS Code panel. The communication subgroup change statistics are collected every 100 ms. The changes of communication subgroups are distinguished by diamonds in different colors. Blue indicates that a communication subgroup is created, purple indicates that a communication subgroup is cleared, and yellow indicates that there are communication subgroups created and cleared within 100 ms.Figure 17 Communication subgroup change overview
- View the debugging information in the debugging area on the left.Figure 18 Viewing debugging information



