Rate This Document
Findability
Accuracy
Completeness
Readability

Sample 1: Parallel Debugging of MPI Applications

This sample demonstrates how to use the Compiler and Debugger to debug MPI applications.

  1. Obtain the MPI program source file bcast_demo.c from GitHub.

    The downloaded source package is devkitdemo-devkitdemo-23.0.1.zip. After decompression, bcast_demo.c in the Compiler_and_Debugger/mpi_demo/ directory is used as the MPI program source file. During compilation, add -g to the end of the mpicc compile command to generate an executable file with debugging information.

    1
    mpicc -g bcast_demo.c -o bcast_demo
    

    Set the path to the bcast_demo executable file as the program path in 3, and the executable file path as the program source code path in 3.

  2. In the resource manager of VS Code, open the local decompressed folder (devkitdemo-devkitdemo-23.0.1/Compiler_and_Debugger/mpi_demo). Access the Kunpeng DevKit, click the Development tab, and then click Debug in the Compiler and Debugger area to open the debugging page.
    Figure 1 Selecting a debugging type
  3. Select Parallel HPC application for Type and set parameters for debugging an MPI application as required. Table 1 describes the parameters.
    Figure 2 Setting parameters for MPI application debugging
    Table 1 Parallel HPC application debugging parameters

    Parameter

    Description

    Configured Remote Server

    Target server for debugging a parallel HPC application.

    Linux User Name

    Name of the Linux user who starts the MPI application.

    NOTE:

    The root user account has the highest permission. To avoid unnecessary risks to the system, we strongly recommend you use a non-root account for the debugging.

    Linux User Password

    Password of the Linux user.

    SSH Port

    SSH port number of the server where the MPI application is started.

    Application

    MPI application. Associated application paths can be automatically displayed for selection.

    Grant the Linux user the read permission for the current MPI application and the read, write, and execute permissions for the directory where the application is located.
    NOTE:
    • The MPI application must be an executable file.
    • If there is no source code information in the MPI application, the debugger performs debugging in assembly mode by default.

    (Optional) Application Arguments

    Arguments transferred to the application. If there are multiple arguments, separate them with spaces.

    Grant the Linux user the read, write, and execute permissions for the directory where the application is located and the execute permission for the parent directory.

    Application Source Code Path

    Shared path for storing the source code and MPI application. Associated working directory of source code can be automatically displayed for selection.

    1. If a shared path has been configured for the MPI application, the source code and MPI application must be stored in the shared path.
    2. Grant the Linux user the read and execute permissions for the source code directory of the current MPI application and the execute permission for the parent directory.

    (Optional) Environment Variables

    Enter the environment variables required for running a parallel HPC application in any of the following ways:

    • export PATH=$PATH:/path/to/mpi
    • source /configure/mpi/path/file
    • module load /mpi/modulefiles

    Launch Type

    Debugging launch type, which can be:

    • mpirun command
    • Donau Scheduler
    • Slurm Scheduler

    MPI Application Command

    mpirun command and the corresponding command arguments. The number of ranks ranges from 1 to 2,048.

    Command to Run Donau Scheduler

    Command to run Donau Scheduler and corresponding command arguments.

    Command to Run Slurm Scheduler

    srun command and corresponding command arguments.

    (Optional) OpenMP Application

    If this parameter is selected, you need to enter the number of OpenMP threads.

    (Optional) OpenMP Threads

    Number of OpenMP threads, which ranges from 1 to 1,024.

    (Optional) Deadlock Detection

    If this parameter is selected, you need to specify the lock wait timeout.

    (Optional) Lock Wait Timeout (s)

    Amount of time a transaction waits to obtain a lock. The default value is 10. The value ranges from 10 to 60.

  4. Click Debug. If a message about permission issues is displayed in the lower right corner, as shown in Figure 3, run the following command:
    Figure 3 Permission message
    chmod 700 -R directory_name/
  5. Click Debug again to start debugging the parallel HPC application and read the rank status.
    Figure 4 Starting parallel HPC application debugging
    Figure 5 Reading the rank status
  6. If the rank status is successfully read, the MPI application debugging page automatically appears. The RUN AND DEBUG window, source code window, and debugging bar are displayed. The RUN AND DEBUG window consists of the debugging information and RANK INFO areas, as shown in Figure 6.
    Figure 6 Rank status read successfully
  7. Select the debug granularity.
    Figure 7 Selecting the debug granularity
    Table 2 Debug granularity

    Debug Granularity

    Description

    All

    Debugs all ranks.

    rank

    Debugs a single rank.

    Communication Groups

    Debugs the communication group hosting the rank you select.

  8. Click any button in the debugging bar to debug the MPI application.
    Table 3 Description of debugging buttons

    Icon

    Operation

    Description

    Resume

    Runs the code until the next breakpoint.

    Suspend

    Suspends the program that is being executed.

    Skip a single step

    Executes the next line of code.

    Step in

    Steps in to the function.

    Step out

    Steps out of the function.

    Restart

    Restarts debugging.

    Stop

    Stops debugging.

  9. Select All in the RANK INFO area, and add breakpoints at lines 89, 47, and 93.

    You can add conditional breakpoints (expressions and hit counts). Conditional breakpoints can be modified, enabled, disabled, and deleted. An expression breakpoint indicates that the program is stopped when the expression is true. A hit count breakpoint indicates that the program is stopped when the specified number of hits is reached or exceeded.

  10. Select All and click . When the code execution reaches line 89, click to execute the MPI_Comm_split(MPI_COMM_WORLD, color, rankNum, &row_comm) function that groups ranks. In this example, four ranks are grouped into two communication subgroups. rank0 and rank1 are in communication subgroup 1, and rank2 and rank3 in communication subgroup 2.
    Figure 8 Two communication subgroups generated

    The MPI_Comm_split(MPI_COMM_WORLD, color, rankNum, &row_comm) function is used to create new communication subgroups.

    1. MPI_COMM_WORLD indicates the original communication group. The original communication group does not go away, but a new communication group is created on each process.
    2. color specifies the new communication subgroup to which a rank belongs.
    3. rankNum specifies the ordering (rank) in each new communication group. The process which passes in the smallest value for rankNum will be rank0, the next smallest will be rank1, and so on.
    4. row_comm indicates how MPI returns the new communication subgroup to a user.
  11. After two communication subgroups are generated, the source code of all ranks is executed to line 90. Select Communication Groups, select rank0 in communication subgroup 1, and click . The rank code in communication subgroup 1 is executed to line 92, and the rank code in communication subgroup 2 remains unchanged and stays at line 90.
    Figure 9 Debugging communication subgroup 1
    Figure 10 Communication subgroup 2 not debugged
  12. Select the Communication Groups debug granularity, select rank0 in communication subgroup 1 on the left, and click to continue. When the code execution reaches line 47, click to execute the MPI_Barrier(MPI_COMM_WORLD) function. After the function is executed, communication subgroup 1 remains in the waiting state. Select rank2 in communication subgroup 2 on the left and click . The code is executed to line 47. Click to execute the MPI_Barrier(MPI_COMM_WORLD) function. After the function is executed, communication subgroup 1 exits the waiting state and the code is executed to line 49.
    Figure 11 Executing the barrier function
    • MPI_Barrier(MPI_COMM_WORLD) is a barrier function. MPI_COMM_WORLD specifies the communication group in which the barrier takes place. The function synchronizes all ranks in a communication group. When the function is called, the ranks are in the waiting state. The function can be executed only after all ranks in the communication subgroup call the function.
    • When communication subgroup 1 executes the MPI_Barrier function, communication subgroup 2 does not synchronously execute the function because the Communication Groups debugging mode is used. After executing the MPI_Barrier function, communication subgroup 1 remains in the waiting state until communication subgroup 2 executes the MPI_Barrier function.
  13. Click . View the communication subgroup change in the VS Code panel. The communication subgroup change statistics are collected every 100 ms. The changes of communication subgroups are distinguished by diamonds in different colors. Blue indicates that a communication subgroup is created, purple indicates that a communication subgroup is cleared, and yellow indicates that there are communication subgroups created and cleared within 100 ms.
    Figure 12 Communication subgroup change overview

    If a deadlock is detected during debugging, a message will be displayed on the COMMUNICATION SUBGROUP CHANGE panel. You can click View Details or the Deadlocks tab to view the deadlock details, including the deadlock status diagram and a table. The table displays the rank, source process (rank), target process (rank), tag, data size (byte), and call stack information.

  14. Delete the breakpoint at line 47. Select All in the RANK INFO area. Click . The code execution reaches line 93. Click to execute the MPI_Comm_free(&row_comm) function to release the created communication subgroups (if required).
    Figure 13 Releasing a communication subgroup
  15. View the debugging information in the debugging area on the left.
    Figure 14 Viewing debugging information