Using acquire and release Semantics for Synchronization
Compared with Armv7, Armv8 is added with load-acquire (LDLARB, LDLARH, and LDLAR) and store-release (STLLRB, STLLRH, and STLLR) instructions to support semantics in the C++ atomic library. These instructions can be interpreted as a half-barrier. These half-barrier instructions are executed more efficiently than full-barrier instructions. Therefore, when this type of barrier can be used, the acquire and release semantics are used for synchronization between threads.
Read-Acquire is used to modify the memory read instruction. A read-acquire instruction prevents the subsequent memory operation instructions from being executed in advance. That is, the subsequent memory operation instructions cannot cross the barrier during rearrangement.
Write-Release is used to modify a memory write instruction. A write-release write instruction prevents the memory operation instructions from being executed until the write instruction is completed. That is, the memory operation instructions before the write instruction cannot cross the barrier during rearrangement.
To further improve the performance, you can rewrite the code in the first example in Porting the Compiler Memory Barriers as follows:
thread0(void) {
data = 1;
barrier();
__atomic_store_n (&flag, 1, __ATOMIC_RELEASE);
}
thread1(void) {
if (__atomic_load_n (&flag, __ATOMIC_ACQUIRE) != 1)
return;
assert(data == 1);
}