Sample 4: MPI Application Analysis
Introduction
This sample uses the HPC application analysis function of the Kunpeng DevKit System Profiler to analyze an MPI application, helping you learn about the communication status of the application in each rank.
Setting Up the Environment
- Check that your server CPU model is Kunpeng 920 and the OS kernel is 4.19 or later or patched openEuler 4.14 or later.
- Check that the GCC version on the server is 7.3.0 or later.
- Check that the Kunpeng DevKit System Profiler has been installed on the server.
- Download the code sample ring.c from GitHub and run the following command to grant all users the read, write, and execute permissions.
chmod 777 ring.c
Refined Analysis
- Prepare the code sample.
Compile ring.c and grant all users the read, write, and execute permissions on the executable file.
mpicc ring.c -O3 -o ring -fopenmp -lm && chmod 777 ring
- Create an HPC application analysis task to analyze the current application.
Click
next to the System Profiler and select General analysis. On the task creation page that is displayed, select HPC Application, set the required parameters, and click OK to start the HPC application analysis task.Figure 1 Creating an HPC application analysis task
Table 1 Parameter description Parameter
Description
Analysis Type
Set it to HPC application analysis.
Analysis Object
Set it to Application.
Mode
Set it to Launch application.
Application Path
Enter the absolute path of the application. In this sample, the sample code is stored in /opt/testdemo/mpi/ring on the server. In a multi-node cluster, the application exists in the directory on the corresponding node.
Analysis Mode
Set it to Refined analysis.
Shared Directory
If there is only one node, enter an available directory on the operating system. In a multi-node cluster, enter the shared directory between nodes. In this sample, the collection is performed on two nodes, and the shared directory /home/share is used.
mpirun Path
Enter the absolute path of the mpirun command.
mpirun Parameter
--allow-run-as-root -H node_IP_address:number_of_ranks (for example, --allow-run-as-root -H 192.168.1.10:4)
Sampling Duration (s)
Set it to 60. If the sampling duration is too short, the result data may be incomplete because the application running has not completed or the application is stopped.
Collect More Call Stack Statistics
Enable this option.
- View the analysis results.
As shown in Figure 2, click to view the rank-to-rank heatmap. The data volume of (ranki, rankj) is the data sent by the ranki to rankj plus the data received by ranki from rankj. Move the cursor to select an area in the left part of the following figure to view its details on the right part. You can click
or
or scroll the mouse wheel to zoom in or zoom out the area.Figure 3 Selecting a communicator
In the displayed dialog box for selecting a communicator, click
to search for communicator name and communicator member, click
to sort the communicator members, and click View Details to see the communicator information.Figure 4 Node-to-node heatmap
When the statistical object is Node To Node, you can view the local percentage, cross-die percentage, and cross-chip percentage of the current rank.
Figure 5 MPI timeline
You can select different color blocks to view a rank's communication mode, communication duration, and communication delay.
Figure 6 MPI timeline-rank
You can click a region color block of a rank in a certain period to view the PMU event information in this period.
Statistical Analysis
- Prepare the code sample.
Compile ring.c and grant all users the read, write, and execute permissions on the executable file.
mpicc ring.c -O3 -o ring -fopenmp -lm && chmod 777 ring
- Create an HPC application analysis task and start the analysis.
Click
next to the System Profiler and select General analysis. On the task creation page that is displayed, select HPC Application, set the required parameters, and click OK to start the HPC application analysis task.Figure 7 Creating an HPC application analysis task
Table 2 Parameter description Parameter
Description
Analysis Type
Set it to HPC application analysis.
Analysis Object
Set it to Application.
Mode
Set it to Launch application.
Application Path
Enter the absolute path of the application. In this sample, the sample code is stored in /opt/testdemo/mpi/ring on the server. In a multi-node cluster, the application exists in the directory on the corresponding node.
Analysis Mode
Set it to Statistical analysis.
Shared Directory
If there is only one node, enter an available directory on the operating system. In a multi-node cluster, enter the shared directory between nodes. In this sample, the collection is performed on two nodes, and the shared directory /home/share is used.
mpirun Path
Enter the absolute path of the mpirun command.
mpirun Parameter
--allow-run-as-root -H node_IP_address:number_of_ranks (for example, --allow-run-as-root -H 192.168.1.10:4)
Sampling Mode
Set it to Detail.
Sampling Duration (s)
Set it to 60. If the sampling duration is too short, the result data may be incomplete because the application running has not completed or the application is stopped.
- View the analysis result.
As shown in Figure 8, the upper part of the Summary tab page displays tuning suggestions, elapsed time, CPU utilization, ratio of CPU cycles/Retired instructions (CPI), number of retired instructions, and MPI wait rate.
As shown in Figure 9, the Hotspots area displays the CPU usage of hotspot functions in the application. The grouping mode is function. You can change it to module, parallel-region, or barrier-to-barrier-segment.
As shown in Figure 10, the Memory Bandwidth area displays information about the bandwidth invoked by the current application and the instruction distribution. You can move the mouse pointer to the question mark next to a parameter to view details. The HPC Top-Down area displays the names and proportions of top-down events. You can move the mouse pointer to
next to a parameter to view details.As shown in Figure 11, the MPI runtime metrics area displays the runtime data of the current MPI application. The Original PMU Events area displays PMU events and the count in the current application.




