Rate This Document
Findability
Accuracy
Completeness
Readability

Fixing a False Sharing Issue

This section briefly demonstrates the end-to-end workflow of using the Dynamic Code Optimizer to collect and analyze a source file. Using the falsesharing_demo.cpp source file as an example, it identifies a false sharing issue, applies optimization suggestions to fix it, and verifies that the issue has been resolved. Figure 1 shows the overall process.

Figure 1 Overall process

The tool outputs hotspot issues based on the collected data. It cannot ensure that all false sharing issues in the program are detected.

Prerequisites

As an example, the Dynamic Code Optimizer is installed in /home/DevKit-Optimizer-CLI-x.x.x-Linux-Kunpeng.

Procedure

  1. Locate where the false sharing issue occurs.
    1. Compile the source file to generate an executable binary file. In the example, the source file path is /home/test/falsesharing_demo.cpp. Replace it with the actual path.
      g++ /home/test/falsesharing_demo.cpp -o falsesharing_demo -g -lpthread

      Content of the falsesharing_demo.cpp file:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      #include <sched.h>
      #include <cstring>
      #include <pthread.h>
      #include <stdio.h>
      
      #define EXE_TIME 999999990
      #define NUM_THREADS 2
      
      int arr[32];
      
      void *sum_a(void*)
      {
          int cpu_num = 0;
          cpu_set_t mask;
          cpu_set_t get;
          CPU_ZERO(&mask);
          CPU_SET(cpu_num, &mask);
          if (sched_setaffinity(0, sizeof(mask), &mask) == -1) {
              perror("warning: could not set CPU affinity\n");
          }
          CPU_ZERO(&get);
           if (sched_getaffinity(0, sizeof(get), &get) == -1) {
              perror("warning: could not get CPU affinity\n");
          }
      
          if (CPU_ISSET(cpu_num, &get)) {
              printf("sum_a affinity cpu_id: %d, current cpu: %d\n", cpu_num, sched_getcpu());
          }
      
          int s = 0;
          for (int i = 0; i < EXE_TIME; i++) {
              s = arr[0];
              arr[0] += 1;
          }
      }
      
      void *inc_b(void*)
      {
          int cpu_num = 1;
          cpu_set_t mask;
          cpu_set_t get;
          CPU_ZERO(&mask);
          CPU_SET(cpu_num, &mask);
          if (sched_setaffinity(0, sizeof(mask), &mask) == -1) {
              perror("warning: could not set CPU affinity\n");
          }
          CPU_ZERO(&get);
           if (sched_getaffinity(0, sizeof(get), &get) == -1) {
              perror("warning: could not get CPU affinity\n");
          }
      
          if (CPU_ISSET(cpu_num, &get)) {
              printf("inc_b affinity cpu_id: %d, current cpu: %d\n", cpu_num, sched_getcpu());
          }
          
          int s = 0;
          for (int i = 0; i < EXE_TIME; i++) {
              s = arr[1];
              arr[1] += 1;
          }
      }
      
      int main()
      {
          int ret;
          pthread_t tids[NUM_THREADS];
          ret = pthread_create(&tids[0], NULL, sum_a, NULL);
          if (ret != 0) {
              printf("pthread_create error: error code %d\n", ret);
              return -1;
          }
          ret = pthread_create(&tids[1], NULL, inc_b, NULL);
          if (ret != 0) {
              printf("pthread_create error: error code %d\n", ret);
              return -1;
          }
          pthread_join(tids[0], NULL);
          pthread_join(tids[1], NULL);
          return 0;
      }
      
    2. Execute the binary file in the /home/test directory.
      1
      ./falsesharing_demo
      
      Command output:
      inc_b affinity cpu_id: 1, current cpu: 1
      sum_a affinity cpu_id: 0, current cpu: 0
    3. Keep the program running, go to the Dynamic Code Optimizer tool directory, specify PID of the falsesharing_demo process, and collect basic performance data to generate a data file.
      cd /home/DevKit-Optimizer-CLI-x.x.x-Linux-Kunpeng
      ./devopt.sh record -p 3315674 -d 5 -o /home/test

      The process ID is 3315674 and the collection duration is 5 seconds. A data file is generated in the /home/test directory.

      Command output:

      Saved the record data to /home/test/devopt_3315674_20260513155321.rawdata
    4. Enable the refined memory collection mode to perform refined memory analysis based on the data file.
      ./devopt.sh record -p 3315674 -d 5 -m -i /home/test/devopt_3315674_20260513155321.rawdata

      After the preceding command is executed, the basic performance data in the data file is read, the false sharing detection tool is started to collect refined memory data, analyze and detect false sharing, and add the detection result to the raw data file.

      Command output:

      Possible false sharing detected
    5. Use either of the following methods to view the analysis result:
      • Run the script command to view the analysis result.
        ./devopt.sh script -i /home/test/devopt_3315674_20260513155321.rawdata -t memory

        Command output:

        FS 1: 0x4009bc <-> 0x400bcc Type:SL CacheLineAddr:0x420040 Access Info: adjacent;+0x20/+0x24;4B/4B
            A: sum_a(void*)@/home/test/falsesharing_demo:32
            B: inc_b(void*)@/home/test/falsesharing_demo:59
        Table 1 Fields

        Field

        Description

        FS

        Indicates that the event is a false sharing (FS) event.

        1

        Sequence number of a false sharing event, indicating the analysis priority of a false sharing pair.

        0x4009bc, 0x400bcc

        Program counter address that accesses the cache line.

        Type

        Memory access type.

        • Store-Store (SS): indicates that both accesses are write operations.
        • Store-Load (SL): indicates that one access is a write operation and the other is a read operation.

        CacheLineAddr

        Start address of the cache line where the false sharing issue occurs.

        In the example, the start address is 0x420040.

        Access Info

        Memory access information, including whether the accessed cache line addresses are adjacent, the offset, and the access width.

        • adjacent: indicates that the addresses of two cache line accesses are next to each other.
        • same-line: indicates that the addresses of two cache line accesses are not next to each other.

        In the example, adjacent indicates that the addresses of the two cache line accesses are adjacent; +0x20/+0x24 indicates that the offsets of the two accesses are +0x20 and +0x24; and 4B/4B indicates that both accesses have a width of 4 bytes.

        A, 32

        Function name and source code line number corresponding to the first program counter address.

        In the example, A and 32 indicate the function name and source code line number corresponding to the first program counter address (0x4009bc).

        B, 59

        Function name and source code line number corresponding to the second program counter address.

        In the example, B and 59 indicate the function name and source code line number corresponding to the second program counter address (0x400bcc).

      • Run the report command to view the analysis result.
        ./devopt.sh report -i /home/test/devopt_3315674_20260513155321.rawdata

        Command output:

        In the table area of the summary screen, a value of M in the issue column indicates that a false sharing issue has been detected in the corresponding function. After pressing Enter to go to the details screen, the Code Suggestion area displays source code optimization suggestions. See the following figure:

        Using the sum_a(void*) function as an example, the Code Suggestion area displays the following content:

        Code Suggestions
        Memory:
        FS 1: self=0x4009bc L:32 <-> inc_b(void*) pc=0x400bcc L:59, kind=SL, cacheLine=0x420040, adjacent(+0x20/+0x24, 4B/4B)
        
        Suggestion: isolate hot data on separate cache lines using padding or alignas(64), and prefer thread-local or per-thread state with deferred merge or publish to reduce cache-line contention.

        Using the inc_b(void*) function as an example, the Code Suggestion area displays the following content:

        Code Suggestions 
        Memory: 
        FS1: self=0x400bcc L:59 <-> sum_a(void*) pc=0x4009bc L:33, kind=SL, cacheLine=0x420040, adjacent(+0x20/+0x24, 4B/4B)  
        
        Suggestion: isolate hot data on separate cache lines using padding or alignas(64), and prefer thread-local or per-thread state with deferred merge or publish to reduce cache-line contention.

        Table 1 describes the fields.

    6. Draw a conclusion.
      1. The Dynamic Code Optimizer results show that the sum_a function and the inc_b function access two adjacent but different 4-byte regions within the same cache line.
      2. According to the source code, the sum_a function is bound to CPU 0, while the inc_b function is bound to CPU 1. The corresponding false sharing conflict lines frequently read and write arr[0] and arr[1], where arr is a contiguous integer array defined as int arr[32].
      3. arr[0] and arr[1] share the same cache line, which leads to a typical false sharing issue.
  2. Fix the false sharing issue.

    Modify the source file by using padding or alignas(64) to isolate the data. Change int arr[32] in line 9 of the source file to the following content:

    struct alignas(64) Item {   
         int value;   
         char padding[60]; 
    };

    Change s = arr[0]; in line 32 and arr[0] += 1; in line 33 of the source file to the following content:

    s = arr[0].value;
    
    arr[0].value += 1;

    Change s = arr[1]; in line 58 and arr[1] += 1; in line 59 of the source file to the following content:

    s = arr[1].value;
    
    arr[1].value += 1;
  3. Verify whether the false sharing issue has been resolved.
    Save the changes, recompile and run the source file, and run the data collection command again.
    ./devopt.sh record -p 3315674 -d 5 -o /home/test

    Command output:

    Saved the record data to /home/test/devopt_3315674_20260513155321.rawdata
    Perform refined memory analysis again.
    ./devopt.sh record -p 3315674 -d 5 -m -i /home/test/devopt_3315674_20260513155321.rawdata

    Command output:

    No false sharing detected

    The result information indicates that the analysis is successful and no false sharing issue is detected.

    Run the report command to view the analysis result.
    ./devopt.sh report -i /home/test/devopt_3315674_20260513155321.rawdata

    The summary screen is displayed, as shown in the following figure:

    In the table area of the summary screen, the issue column for the sum_a and inc_b functions no longer shows M, and the mem_bound value has decreased. On the details screen, no source code optimization suggestions are displayed, indicating that the false sharing issue has been resolved.