Fixing a False Sharing Issue
This section briefly demonstrates the end-to-end workflow of using the Dynamic Code Optimizer to collect and analyze a source file. Using the falsesharing_demo.cpp source file as an example, it identifies a false sharing issue, applies optimization suggestions to fix it, and verifies that the issue has been resolved. Figure 1 shows the overall process.
The tool outputs hotspot issues based on the collected data. It cannot ensure that all false sharing issues in the program are detected.
Prerequisites
As an example, the Dynamic Code Optimizer is installed in /home/DevKit-Optimizer-CLI-x.x.x-Linux-Kunpeng.
Procedure
- Locate where the false sharing issue occurs.
- Compile the source file to generate an executable binary file. In the example, the source file path is /home/test/falsesharing_demo.cpp. Replace it with the actual path.
g++ /home/test/falsesharing_demo.cpp -o falsesharing_demo -g -lpthread
Content of the falsesharing_demo.cpp file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
#include <sched.h> #include <cstring> #include <pthread.h> #include <stdio.h> #define EXE_TIME 999999990 #define NUM_THREADS 2 int arr[32]; void *sum_a(void*) { int cpu_num = 0; cpu_set_t mask; cpu_set_t get; CPU_ZERO(&mask); CPU_SET(cpu_num, &mask); if (sched_setaffinity(0, sizeof(mask), &mask) == -1) { perror("warning: could not set CPU affinity\n"); } CPU_ZERO(&get); if (sched_getaffinity(0, sizeof(get), &get) == -1) { perror("warning: could not get CPU affinity\n"); } if (CPU_ISSET(cpu_num, &get)) { printf("sum_a affinity cpu_id: %d, current cpu: %d\n", cpu_num, sched_getcpu()); } int s = 0; for (int i = 0; i < EXE_TIME; i++) { s = arr[0]; arr[0] += 1; } } void *inc_b(void*) { int cpu_num = 1; cpu_set_t mask; cpu_set_t get; CPU_ZERO(&mask); CPU_SET(cpu_num, &mask); if (sched_setaffinity(0, sizeof(mask), &mask) == -1) { perror("warning: could not set CPU affinity\n"); } CPU_ZERO(&get); if (sched_getaffinity(0, sizeof(get), &get) == -1) { perror("warning: could not get CPU affinity\n"); } if (CPU_ISSET(cpu_num, &get)) { printf("inc_b affinity cpu_id: %d, current cpu: %d\n", cpu_num, sched_getcpu()); } int s = 0; for (int i = 0; i < EXE_TIME; i++) { s = arr[1]; arr[1] += 1; } } int main() { int ret; pthread_t tids[NUM_THREADS]; ret = pthread_create(&tids[0], NULL, sum_a, NULL); if (ret != 0) { printf("pthread_create error: error code %d\n", ret); return -1; } ret = pthread_create(&tids[1], NULL, inc_b, NULL); if (ret != 0) { printf("pthread_create error: error code %d\n", ret); return -1; } pthread_join(tids[0], NULL); pthread_join(tids[1], NULL); return 0; }
- Execute the binary file in the /home/test directory.
1./falsesharing_demo
Command output:inc_b affinity cpu_id: 1, current cpu: 1 sum_a affinity cpu_id: 0, current cpu: 0
- Keep the program running, go to the Dynamic Code Optimizer tool directory, specify
PID of the falsesharing_demo process, and collect basic performance data to generate a data file.cd /home/DevKit-Optimizer-CLI-x.x.x-Linux-Kunpeng ./devopt.sh record -p 3315674 -d 5 -o /home/test
The process ID is 3315674 and the collection duration is 5 seconds. A data file is generated in the /home/test directory.
Command output:
Saved the record data to /home/test/devopt_3315674_20260513155321.rawdata
- Enable the refined memory collection mode to perform refined memory analysis based on the data file.
./devopt.sh record -p 3315674 -d 5 -m -i /home/test/devopt_3315674_20260513155321.rawdata
After the preceding command is executed, the basic performance data in the data file is read, the false sharing detection tool is started to collect refined memory data, analyze and detect false sharing, and add the detection result to the raw data file.
Command output:
Possible false sharing detected
- Use either of the following methods to view the analysis result:
- Run the script command to view the analysis result.
./devopt.sh script -i /home/test/devopt_3315674_20260513155321.rawdata -t memory
Command output:
FS 1: 0x4009bc <-> 0x400bcc Type:SL CacheLineAddr:0x420040 Access Info: adjacent;+0x20/+0x24;4B/4B A: sum_a(void*)@/home/test/falsesharing_demo:32 B: inc_b(void*)@/home/test/falsesharing_demo:59Table 1 Fields Field
Description
FS
Indicates that the event is a false sharing (FS) event.
1
Sequence number of a false sharing event, indicating the analysis priority of a false sharing pair.
0x4009bc, 0x400bcc
Program counter address that accesses the cache line.
Type
Memory access type.
- Store-Store (SS): indicates that both accesses are write operations.
- Store-Load (SL): indicates that one access is a write operation and the other is a read operation.
CacheLineAddr
Start address of the cache line where the false sharing issue occurs.
In the example, the start address is 0x420040.
Access Info
Memory access information, including whether the accessed cache line addresses are adjacent, the offset, and the access width.
- adjacent: indicates that the addresses of two cache line accesses are next to each other.
- same-line: indicates that the addresses of two cache line accesses are not next to each other.
In the example, adjacent indicates that the addresses of the two cache line accesses are adjacent; +0x20/+0x24 indicates that the offsets of the two accesses are +0x20 and +0x24; and 4B/4B indicates that both accesses have a width of 4 bytes.
A, 32
Function name and source code line number corresponding to the first program counter address.
In the example, A and 32 indicate the function name and source code line number corresponding to the first program counter address (0x4009bc).
B, 59
Function name and source code line number corresponding to the second program counter address.
In the example, B and 59 indicate the function name and source code line number corresponding to the second program counter address (0x400bcc).
- Run the report command to view the analysis result.
./devopt.sh report -i /home/test/devopt_3315674_20260513155321.rawdata
Command output:

In the table area of the summary screen, a value of M in the issue column indicates that a false sharing issue has been detected in the corresponding function. After pressing Enter to go to the details screen, the Code Suggestion area displays source code optimization suggestions. See the following figure:

Using the sum_a(void*) function as an example, the Code Suggestion area displays the following content:
Code Suggestions Memory: FS 1: self=0x4009bc L:32 <-> inc_b(void*) pc=0x400bcc L:59, kind=SL, cacheLine=0x420040, adjacent(+0x20/+0x24, 4B/4B) Suggestion: isolate hot data on separate cache lines using padding or alignas(64), and prefer thread-local or per-thread state with deferred merge or publish to reduce cache-line contention.
Using the inc_b(void*) function as an example, the Code Suggestion area displays the following content:
Code Suggestions Memory: FS1: self=0x400bcc L:59 <-> sum_a(void*) pc=0x4009bc L:33, kind=SL, cacheLine=0x420040, adjacent(+0x20/+0x24, 4B/4B) Suggestion: isolate hot data on separate cache lines using padding or alignas(64), and prefer thread-local or per-thread state with deferred merge or publish to reduce cache-line contention.
Table 1 describes the fields.
- Run the script command to view the analysis result.
- Draw a conclusion.
- The Dynamic Code Optimizer results show that the sum_a function and the inc_b function access two adjacent but different 4-byte regions within the same cache line.
- According to the source code, the sum_a function is bound to CPU 0, while the inc_b function is bound to CPU 1. The corresponding false sharing conflict lines frequently read and write arr[0] and arr[1], where arr is a contiguous integer array defined as int arr[32].
- arr[0] and arr[1] share the same cache line, which leads to a typical false sharing issue.
- Compile the source file to generate an executable binary file. In the example, the source file path is /home/test/falsesharing_demo.cpp. Replace it with the actual path.
- Fix the false sharing issue.
Modify the source file by using padding or alignas(64) to isolate the data. Change int arr[32] in line 9 of the source file to the following content:
struct alignas(64) Item { int value; char padding[60]; };Change s = arr[0]; in line 32 and arr[0] += 1; in line 33 of the source file to the following content:
s = arr[0].value; arr[0].value += 1;
Change s = arr[1]; in line 58 and arr[1] += 1; in line 59 of the source file to the following content:
s = arr[1].value; arr[1].value += 1;
- Verify whether the false sharing issue has been resolved.Save the changes, recompile and run the source file, and run the data collection command again.
./devopt.sh record -p 3315674 -d 5 -o /home/test
Command output:
Saved the record data to /home/test/devopt_3315674_20260513155321.rawdata
Perform refined memory analysis again../devopt.sh record -p 3315674 -d 5 -m -i /home/test/devopt_3315674_20260513155321.rawdata
Command output:
No false sharing detected
The result information indicates that the analysis is successful and no false sharing issue is detected.
Run the report command to view the analysis result../devopt.sh report -i /home/test/devopt_3315674_20260513155321.rawdata
The summary screen is displayed, as shown in the following figure:

In the table area of the summary screen, the issue column for the sum_a and inc_b functions no longer shows M, and the mem_bound value has decreased. On the details screen, no source code optimization suggestions are displayed, indicating that the false sharing issue has been resolved.
