Porting the Compiler Memory Barriers
During program compilation, especially after the optimization option -O2 or -O3 is added, the compiler may execute code in out-of-order. As a result, the execution sequence of the assembly code generated after compilation is different with the execution sequence in the original high-level language code. To resolve this problem, the compiler provides a memory barrier in the compilation phase to instruct the compiler to update the register values to the memory in a timely manner, so as to ensure that the memory access instructions before and after the compiler barrier are sequentially arranged after compilation.
Common compilation barriers are defined as follows:
#define barrier() __asm__ __volatile__("": : :"memory")
In most cases, the compilation barrier can be used to ensure the consistency of multi-thread memory access in x86 architecture. However, this cannot be ensured in the ARM architecture. Example:
#define barrier() __asm__ __volatile__("": : :"memory")
// init: flag = data = 0;
thread0(void){
data = 1;
barrier();
flag =1;
}
thread1(void){
if(flag != 1) return;
barrier();
assert(data == 1);
}
In the preceding code, thread0 and thread1 run on different CPUs. In the x86 architecture, the assertion of thread1 will not be triggered. However, in the ARM architecture, (flag == 1 && data == 0) may occur. As shown in Table 1, write-write out-of-order execution is allowed in the ARM architecture. If the flag is set to 1 but no value is assigned to data, the assertion in thread1 will be triggered.
With reference to the platform-related macro definitions in the Linux kernel code, the code is as follows:
// x86
#define barrier() __asm__ __volatile__("": : :"memory")
#define smp_rmb() barrier()
#define smp_wmb() barrier()
#define smp_mb() asm volatile("lock; addl $0,-132(%%rsp)" ::: "memory", "cc")
// arm
#define smp_mb() asm volatile("dmb ish" ::: "memory")
#define smp_wmb() asm volatile("dmb ishst" ::: "memory")
#define smp_rmb() asm volatile("dmb ishld" ::: "memory")
In the x86 architecture, the smp_rmb() and smp_wmb() macros of the read and write barriers are set as the compiler memory barriers. In the ARM architecture, they are set as the CPU instruction-level memory barriers. To prevent the assertion in thread1 from being triggered, you need to change the compilation barrier in the original code to the CPU-level memory barrier as follows:
#define smp_wmb() asm volatile("dmb ishst" ::: "memory")
#define smp_rmb() asm volatile("dmb ishld" ::: "memory")
// init: flag = data = 0;
thread0(void) {
data = 1;
smp_wmb();
flag = 1;
}
thread1(void) {
if (flag != 1)
return;
smp_rmb();
assert(data == 1);
}