Rate This Document
Findability
Accuracy
Completeness
Readability

Sample 3: Frequent Lock Preemption

Introduction

Lock preemption and contention frequently occur for multi-thread programs, causing waste of CPU resources. Generally, the public resource contention can be addressed by analyzing and simplifying the service logic. In this sample, the lock & wait analysis function of the Kunpeng DevKit System Profiler is used to analyze the service logic. You can reduce the lock size and the number of concurrent threads to reduce lock contention.

Setting Up the Environment

  1. Check whether a compatible OS is installed on the server and the GCC version is 7.3.0 or later. Use the Kunpeng DevKit Compatibility Checker to view the details.
  2. Check that the Kunpeng DevKit System Profiler has been installed on the server.
  3. Download the code samples from GitHub and run the following command to grant the read, write, and execute permissions to all users.

    The sample code files are pthread_atomic.c and pthread_mutex.c.

    1
    chmod 777 pthread_mutex.c pthread_atomic.c
    

Performance Analysis Process

  1. Prepare the program.
    1. Compile pthread_mutex.c and grant the read, write, and execute permissions to all users.
      1
      gcc -g pthread_mutex.c -o pthread_mutex -lpthread && chmod 777 pthread_mutex
      
      • The demo requires that the GCC version be 7.3 or later. If the source code needs to be associated, you are advised to add the -g parameter during compilation, for example, gcc -g pthread_mutex.c -o pthread_mutex -lpthread.
      • If the GCC version is 10.x.x or later, you are advised to add the -march=armv8-a+nolse -mno-outline-atomics parameter to disable the LSE instruction set during compilation because the LSE instruction set is enabled by default in this version and impairs the tuning effect.
      • In this sample, the analysis objects of all tasks (excluding the lock and wait analysis task whose analysis object is the application) are the system. The process/thread performance and resource scheduling analysis tasks are used to analyze lock preemption, and the lock and wait analysis task is used to resolve this problem.
    2. For the background running programs, the nohup command enables the corresponding process to continue running even if you log out, preventing task interruption.
      1
      nohup taskset -c 0-1 ./pthread_mutex >>pthread_mutex.out 2>&1 &
      

      The standard output (1) of the program is saved to the pthread_mutex.out file, and the error information (2) is redirected to the pthread_mutex.out file.

      Generally, it takes 20 seconds to run the program. The program data cannot be collected once the program ends. You can modify the parameter count in the pthread_mutex.c source code to increase the running duration or start the collection task immediately after the program starts.

    3. After tuning, the pthread_atomic.c program is compiled in the same way as in 1.a.
  2. Use the lock and wait analysis to obtain the lock invoking information.

    Click next to the System Profiler and select General analysis. On the task creation page that is displayed, select Locks and Waits, set the required parameters, and click OK to start the lock and wait analysis task.

    Figure 1 Creating a lock and wait analysis task
    Table 1 Task parameters

    Parameter

    Description

    Analysis Type

    Set it to Locks and waits.

    Analysis Object

    Set it to Application.

    Mode

    Set it to Attach to process.

    PID

    Select the process ID corresponding to the pthread_mutex application.

    Sampling Duration (s)

    Set it to 60.

    Other Parameters

    Retain their default values.

  3. View the analysis results.
    Figure 2 Overview of lock and wait analysis results
    Figure 3 Lock snapshot information

    On the Lock Instance Analysis tab page, you can see that the __pthread_mutex_lock function is frequently invoked by two threads. On the Lock Snapshot tab page, check the waiting time of the corresponding lock.

Lock Preemption Tuning

The execution of the actual function code is fast. The multi-thread lock solution consumes a large amount of overhead on lock contention. To reduce the preemption of public resources, you can tune the code by using the atomic variable lock-free programming mode.

  1. Prepare the program.
    1. Compile pthread_atomic.c and grant the read, write, and execute permissions to all users.
      1
      gcc -g pthread_atomic.c -o pthread_atomic -lpthread && chmod 777 pthread_atomic
      
      • The demo requires that the GCC version be 7.3 or later. If the source code needs to be associated, you are advised to add the -g parameter during compilation, for example, gcc -g pthread_atomic.c -o pthread_atomic -lpthread.
      • If the GCC version is 10.x.x or later, you are advised to add the -march=armv8-a+nolse -mno-outline-atomics parameter to disable the LSE instruction set during compilation because the LSE instruction set is enabled by default in this version and impairs the tuning effect.
    2. For the background running programs, the nohup command enables the corresponding process to continue running even if you log out, preventing task interruption.
      1
      nohup taskset -c 0-1 ./pthread_atomic >>pthread_atomic.out 2>&1 &
      

      The standard output (1) of the program is saved to the pthread_atomic.out file, and the error information (2) is redirected to the pthread_atomic.out file.

      Generally, it takes 20 seconds to run the program. The program data cannot be collected once the program ends. You can modify the count parameter in the pthread_atomic.c source code to increase the running duration or start the collection task immediately after the program starts.

  2. Create a new lock and wait analysis task (pthread_atomic.c application).

    Click next to the System Profiler and select General analysis. On the task creation page that is displayed, select Locks and Waits, set the required parameters, and click OK to start the lock and wait analysis task.

    Figure 4 Creating another lock and wait analysis task
    Table 2 Task parameters

    Parameter

    Description

    Analysis Type

    Set it to Lock and wait analysis.

    Analysis Object

    Set it to Application.

    Mode

    Set it to Attach to process.

    PID

    Select the process ID corresponding to the pthread_atomic application.

    Sampling Duration (s)

    Set it to 60.

    Other Parameters

    Retain their default values.

    Figure 5 Lock and wait task after tuning

    The count and total duration of lock instances are greatly reduced.