Rate This Document
Findability
Accuracy
Completeness
Readability

Sample 1: Matrix Analysis

Introduction

This sample uses the Kunpeng DevKit System Profiler to tune the program that uses the for loop to implement one-dimensional matrix calculation. In this case, the hotspot function analysis is performed to identify the hotspot function multiply for matrix calculation. Then, NEON instructions are used to tune the program, and the tuning effects are compared.

Environment Preparations

  1. Check whether a compatible OS is installed on the server and the GCC version is 7.3.0 or later. Use the Kunpeng DevKit Compatibility Checker to view the details.
  2. Check that the Kunpeng DevKit System Profiler has been installed on the server.
  3. Download the code samples from GitHub and run the following command to grant the read, write, and execute permissions to all users.

    The sample code files are multiply.c, multiply_simd.c, multiply_start.sh, and multiplysimd_start.sh.

    chmod 777 multiply.c multiply_simd.c multiply_start.sh multiplysimd_start.sh

Detection of the Hotspot Function multiply for One-Dimensional Matrix Calculation

  1. Prepare the multiply program.
    Compile multiply.c and grant the read, write, and execute permissions to all users.
    gcc -g multiply.c -o multiply && chmod 777 multiply
  2. Use the hotspot function analysis to analyze the program and locate hotspot functions and instructions.

    Click next to the System Profiler and select General analysis. On the task creation page that is displayed, select Hotspot Function, set the required parameters, and click OK to start the hotspot function analysis task.

    Figure 1 Creating a hotspot function analysis task
    Table 1 Task parameters

    Parameter

    Description

    Analysis Type

    Set it to Hotspot functions.

    Analysis Object

    Set it to Application.

    Mode

    Set it to Launch application.

    Application Path

    Enter the absolute path of the application. In this sample, the sample code is stored in the /opt/testdemo/multiply/multiply directory on the server. In the example directory, the first multiply is a folder, and the second multiply is an executable program.

    Sampling Duration (s)

    Set it to 20.

    Call Stack

    Enable this option.

    Sampling Range

    Set it to User mode. The sampling range can be user mode, kernel mode, or all. In this sample, all CPU resources are consumed in user mode. Therefore, select User Mode.

    dwarf

    Enable this option.

    C/C++ Source File Directory

    Associates the source code during collection. Example: /opt/testdemo/multiply/

    Other Parameters

    Retain their default values.

  3. View the analysis results.
    Figure 2 Summary of the hotspot function analysis result

    Figure 2 shows the entire collection execution time and the clock cycles for program running.

    Figure 3 Source code association

    Click a function name in blue to view the number of lines of the function in the source code.

Optimizing the Computation

  1. Prepare the multiply_simd.c program.
    The multiply_simd.c file is optimized based on the multiply.c file using NEON instruction. Compile the multiply_simd.c file and grant the read, write, and execute permissions to all users.
    gcc -g multiply_simd.c -o multiply_simd && chmod 777 multiply_simd
  2. Use the hotspot function analysis to analyze the program and locate hotspot functions and instructions.

    Create a hotspot function analysis task again. Click next to the System Profiler and select General analysis. On the task creation page that is displayed, select Hotspot Function, set the required parameters, and click OK to start the hotspot function analysis task.

    Figure 4 Creating another hotspot function analysis task
    Table 2 Task parameters

    Parameter

    Description

    Analysis Type

    Set it to Hotspot functions.

    Analysis Object

    Set it to Application.

    Mode

    Set it to Launch application.

    Application Path

    Enter the absolute path of the application. In this sample, the sample code is stored in the /opt/testdemo/multiply/multiply_simd directory on the server.

    Sampling Duration (s)

    Set it to 20.

    Call Stack

    Enable this option.

    Sampling Range

    Set it to User mode. The sampling range can be user mode, kernel mode, or all. In this sample, all CPU resources are consumed in user mode. Therefore, select User Mode.

    dwarf

    Enable this option.

    C/C++ Source File Directory

    Associates the source code during collection. Example: /opt/testdemo/multiply/

    Other Parameters

    Retain their default values.

  3. View the analysis results.
    Figure 5 Summary of the hotspot function analysis result

    Figure 5 shows the entire collection execution time and the clock cycles for program running. The multiply_neon function occupies fewer clock cycles.