Optimizing Atomic Operations in Multi-core Scenarios
Principles
Load-link/Store-condition (LL/SC) atomic instructions load shared variables to the L1 cache where the current core is located and modify them. The performance is good when there is little lock contention. In an intense lock contention scenario, the performance deteriorates severely. In Armv8.1 specifications, new atomic instruction extensions (Large System Extensions, LSE) are introduced, and computing operations are performed in the L3 cache. In this way, the data sharing scope is expanded, the time required to establish cache consistency is reduced, and the lock performance is improved in intense lock contention scenarios.
In the case of multiple cores and severe atomic lock contention, you are advised to add the LSE option to the GCC compilation options to ease lock contention.
LL/SC instruction (ldaxr and stlxr):

LSE instruction (ldaddal):

Modification Method
Use GCC 6.0 or later (GCC 7.3.0 or later is recommended.)
Add the following content:
1 | -march=armv8-a+lse
|
Or
1 | -march=armv8.1-a
|
Or
1 | -march=armv8.2-a
|