Rate This Document
Findability
Accuracy
Completeness
Readability

Debugging a Hybrid MPI/OpenMP Application

Open Multi-Processing (OpenMP) is a set of compiler directives with a multi-thread programming design for shared-memory parallel systems. It supports programming languages such as C, C++, and Fortran. Based on threads, OpenMP provides a high-level abstraction of parallelism. Featured with simplicity, scalability, and portability, OpenMP is especially suitable for parallel programs on multi-core CPU hosts.

OpenMP is mainly used for fine-grained loop-level parallelism, which means each loop is allocated to different threads for execution. MPI is mainly used for coarse-grained parallelism. You can debug hybrid MPI/OpenMP applications by rank or thread.

Prerequisites

  1. The program has been compiled.
  2. Open MPI has been installed. For details, see Installing Open MPI.
  3. In the Resource Manager of VS Code, the folder of the local source program has been opened.

MPI/OpenMP applications can be developed using the Fortran programming language. For details about debugging operations, see Debugging an MPI Application Written in Fortran.

Debugging a Hybrid MPI/OpenMP Application Written in C/C++

  1. Click in the shortcut menu area on the left, or click Development and choose Debug under Compiler and Debugger. On the Debug page that is displayed, select Parallel HPC application for Type and set other parameters as required. See Figure 1.
    Figure 1 Parallel HPC application debugging
    Table 1 Parallel HPC application debugging parameters

    Parameter

    Description

    Configured Remote Server

    Target server for debugging a parallel HPC application.

    Linux User Name

    Name of the Linux user who starts the MPI application.

    NOTE:

    The root user account has the highest permission. To avoid unnecessary risks to the system, we strongly recommend you use a non-root account for the debugging.

    Linux User Password

    Password of the Linux user.

    Remember password

    If this option is selected, the Linux user password of the current remote server will be remembered.

    SSH Port

    SSH port number of the server where the MPI application is started.

    Program

    MPI application. Associated application paths can be automatically displayed for selection.

    Grant the Linux user the read permission for the current MPI application and the read, write, and execute permissions for the directory where the application is located.
    NOTE:
    • The MPI application must be an executable file.
    • If there is no source code information in the MPI application, the debugger performs debugging in assembly mode by default.

    (Optional) Program Arguments

    Arguments transferred to the application. If there are multiple arguments, separate them with spaces.

    Grant the Linux user the read, write, and execute permissions for the directory where the application is located and the execute permission for the parent directory.

    Program Source Code Path

    Shared path for storing the source code and MPI application. Associated working directory of source code can be automatically displayed for selection.

    1. If a shared path has been configured for the MPI application, the source code and MPI application must be stored in the shared path.
    2. Grant the Linux user the read and execute permissions for the source code directory of the current MPI application and the execute permission for the parent directory.

    (Optional) Environment Variables

    Enter the environment variables required for running a parallel HPC application in any of the following ways:

    • export PATH=$PATH:/path/to/mpi
    • source /configure/mpi/path/file
    • module load /mpi/modulefiles

    Launch Type

    The options are:

    • mpirun command
    • Donau Scheduler
    • Slurm Scheduler
    NOTE:

    mpirun is a utility used to start parallel MPI applications and provide functions such as communication and cleanup between processes.

    Donau Scheduler is a Huawei-developed HPC cluster scheduler that provides job scheduling with high resource utilization and throughput for large clusters.

    Slurm is an open-source, highly customizable, scalable, and high-performance cluster management system. It provides resource management and job scheduling functions and is widely used in high-performance computing and cluster computing fields, such as physics, chemistry, biology, and astronomy.

    MPI Application Command

    mpirun command and the corresponding command arguments. The number of ranks ranges from 1 to 2048.

    Command to Run Donau Scheduler

    Command to run Donau Scheduler and corresponding command arguments.

    Command to Run Slurm Scheduler

    srun command and corresponding command arguments.

    OpenMP Application

    If this parameter is selected, you need to enter the number of OpenMP threads.

    OpenMP Threads

    Number of OpenMP threads, which ranges from 1 to 1024.

    (Optional) Deadlock Detection

    If this parameter is selected, you need to specify the lock wait timeout.

    (Optional) Lock Wait Timeout (s)

    Amount of time a transaction waits to obtain a lock. The default value is 10. The value ranges from 10 to 60.

  2. After the configuration is complete, click Start. A message is displayed in the lower right corner, indicating that parallel HPC application debugging is starting. The tool checks whether the configuration is correct. If the configuration is incorrect, modify the configuration as prompted. If the configuration is correct, a dialog box is displayed in the lower right corner, indicating that the rank status is being read, as shown in Figure 2.
    Figure 2 Reading the rank status

    If parallel HPC application debugging fails to be started, rectify the fault by following instructions in Failed to Start a Parallel HPC Application Debugging Task.

  3. If the rank status fails to be read, download the latest log file as prompted to view the failure details. See Figure 3.
    Figure 3 Failed to read the rank status
  4. If the rank status is successfully read, the hybrid MPI/OpenMP application debugging page automatically appears. The RUN AND DEBUG window, source code window, and debugging bar are displayed. The RUN AND DEBUG window consists of the debugging information and RANK INFO areas, as shown in Figure 4.
    Figure 4 Rank status read successfully
  5. Click buttons on the debugging bar to perform debugging. See Table 2.
    Table 2 Description of debugging buttons

    Button

    Description

    Runs the code until the next breakpoint.

    Suspends the program that is being executed.

    Executes the next line of code.

    Steps in to the function.

    Steps out of the function.

    Restarts debugging.

    Stops debugging.

    Thread status: The dot before a thread indicates the thread status. A green dot indicates that the thread is stopped, a red dot indicates that the thread is running.

  6. The line of code being debugged is highlighted. You can click the code line number to set a breakpoint. You can right-click the breakpoint to edit, delete, or disable it.

    You can add conditional breakpoints (expressions and hit counts). Conditional breakpoints can be modified, enabled, disabled, and deleted. An expression breakpoint indicates that the program is stopped when the expression is true. A hit count breakpoint indicates that the program is stopped when the specified number of hits is reached or exceeded.

    Figure 5 Setting a breakpoint
    • An expression can contain a maximum of 1024 characters.
    • A hit count is a positive integer less than or equal to 2147483647 (231-1).
  7. Click on the debugging bar to restart hybrid MPI/OpenMP application debugging. After the restart, a dialog box is displayed in the lower right corner, indicating that the rank status is being read. After the rank status is read successfully, the hybrid MPI/OpenMP application debugging page is displayed.
    Figure 6 Restarting a debugging task
  8. Click the RUN AND DEBUG window on the left to view the variables (Locals and Registers), WATCH, BREAKPOINTS, and CALL STACK information.
    1. During debugging, you can right-click a variable expression to reset the variable value or add the variable expression to the WATCH area. Register expressions cannot be added to the WATCH area.
    2. In the WATCH area, you can add, modify, delete some or all watched expressions. Only C or C++ expressions support this function.
    3. In the breakpoint area, you can delete a single breakpoint, and delete, enable, and disable all breakpoints.
    See Figure 7.
    Figure 7 Debugging information
    1. You can click the CALL STACK area to display the stack information, including the function name, file name, number of running lines, and address.
    2. In the debugging information area on the left, click a stack to display the corresponding source code or assembly code in the code area.
  9. Click in the RANK INFO area. The COMMUNICATION SUBGROUP CHANGE page is displayed on the VS Code panel.
    • Click the Change Overview tab. The communication subgroup change data is collected every 100 ms. The changes of communication subgroups are distinguished by diamonds in different colors. Blue indicates that a communication subgroup is created, purple indicates that a communication subgroup is cleared, and yellow indicates that there are communication subgroups created and cleared within 100 ms.
      Figure 8 Communication subgroup change overview

      Hover the mouse pointer over the diamond to see the detailed information about the communication subgroup change.

      Figure 9 Pop-up for a communication subgroup
    • Click Change Details. On the page that is displayed, move the mouse pointer to view detailed information such as the belonging communication subgroup and rank information.
      Figure 10 Communication subgroup change details

      If a deadlock is detected during debugging, a message will be displayed. You can click to see the deadlock details.

      In the RANK INFO area, click to enable the function of collecting data about creating and clearing communication subgroups and of displaying the change overview on the VS Code panel.

      In the Communication Subgroup Change area, you can click Communication subgroup created, Communication subgroups cleared, or Communication subgroups created and cleared to hide corresponding information.