Sample 1: Matrix Analysis
Introduction
This sample uses the Kunpeng DevKit System Profiler to tune the program that uses the for loop to implement one-dimensional matrix calculation. In this case, the hotspot function analysis is performed to identify the hotspot function multiply for matrix calculation. Then, NEON instructions are used to tune the program, and the tuning effects are compared.
Environment Preparations
- Check whether a compatible OS is installed on the server and the GCC version is 7.3.0 or later. Use the Kunpeng DevKit Compatibility Checker to view the details.
- Check that the Kunpeng DevKit System Profiler has been installed on the server.
- Download the code samples from GitHub and run the following command to grant the read, write, and execute permissions to all users.
The sample code files are multiply.c, multiply_simd.c, multiply_start.sh, and multiplysimd_start.sh.
chmod 777 multiply.c multiply_simd.c multiply_start.sh multiplysimd_start.sh
Detection of the Hotspot Function multiply for One-Dimensional Matrix Calculation
- Prepare the multiply program.Compile multiply.c and grant the read, write, and execute permissions to all users.
gcc -g multiply.c -o multiply && chmod 777 multiply
- Use the hotspot function analysis to analyze the program and locate hotspot functions and instructions.
Click
next to the System Profiler and select General analysis. On the task creation page that is displayed, select Hotspot Function, set the required parameters, and click OK to start the hotspot function analysis task.Figure 1 Creating a hotspot function analysis task
Table 1 Task parameters Parameter
Description
Analysis Type
Set it to Hotspot functions.
Analysis Object
Set it to Application.
Mode
Set it to Launch application.
Application Path
Enter the absolute path of the application. In this sample, the sample code is stored in the /opt/testdemo/multiply/multiply directory on the server. In the example directory, the first multiply is a folder, and the second multiply is an executable program.
Sampling Duration (s)
Set it to 20.
Call Stack
Enable this option.
Sampling Range
Set it to User mode. The sampling range can be user mode, kernel mode, or all. In this sample, all CPU resources are consumed in user mode. Therefore, select User Mode.
dwarf
Enable this option.
C/C++ Source File Directory
Associates the source code during collection. Example: /opt/testdemo/multiply/
Other Parameters
Retain their default values.
- View the analysis results.
Figure 2 shows the entire collection execution time and the clock cycles for program running.
Figure 3 Source code association
Click a function name in blue to view the number of lines of the function in the source code.
Optimizing the Computation
- Prepare the multiply_simd.c program.The multiply_simd.c file is optimized based on the multiply.c file using NEON instruction. Compile the multiply_simd.c file and grant the read, write, and execute permissions to all users.
gcc -g multiply_simd.c -o multiply_simd && chmod 777 multiply_simd
- Use the hotspot function analysis to analyze the program and locate hotspot functions and instructions.
Create a hotspot function analysis task again. Click
next to the System Profiler and select General analysis. On the task creation page that is displayed, select Hotspot Function, set the required parameters, and click OK to start the hotspot function analysis task.Figure 4 Creating another hotspot function analysis task
Table 2 Task parameters Parameter
Description
Analysis Type
Set it to Hotspot functions.
Analysis Object
Set it to Application.
Mode
Set it to Launch application.
Application Path
Enter the absolute path of the application. In this sample, the sample code is stored in the /opt/testdemo/multiply/multiply_simd directory on the server.
Sampling Duration (s)
Set it to 20.
Call Stack
Enable this option.
Sampling Range
Set it to User mode. The sampling range can be user mode, kernel mode, or all. In this sample, all CPU resources are consumed in user mode. Therefore, select User Mode.
dwarf
Enable this option.
C/C++ Source File Directory
Associates the source code during collection. Example: /opt/testdemo/multiply/
Other Parameters
Retain their default values.
- View the analysis results.
Figure 5 shows the entire collection execution time and the clock cycles for program running. The multiply_neon function occupies fewer clock cycles.

