Porting the Compiler Memory Barriers

During program compilation, especially after the optimization option -O2 or -O3 is added, the compiler may execute code in out-of-order. As a result, the execution sequence of the assembly code generated after compilation is different with the execution sequence in the original high-level language code. To resolve this problem, the compiler provides a memory barrier in the compilation phase to instruct the compiler to update the register values to the memory in a timely manner, so as to ensure that the memory access instructions before and after the compiler barrier are sequentially arranged after compilation.

Common compilation barriers are defined as follows:

#define barrier() __asm__ __volatile__("": : :"memory")

In most cases, the compilation barrier can be used to ensure the consistency of multi-thread memory access in x86 architecture. However, this cannot be ensured in the ARM architecture. Example:

#define barrier() __asm__ __volatile__("": : :"memory")
// init: flag = data = 0;
thread0(void){     
    data = 1;
    barrier();     
    flag =1;
}
thread1(void){
    if(flag != 1) return;
    barrier();
    assert(data == 1);
}

In the preceding code, thread0 and thread1 run on different CPUs. In the x86 architecture, the assertion of thread1 will not be triggered. However, in the ARM architecture, (flag == 1 && data == 0) may occur. As shown in Table 1, write-write out-of-order execution is allowed in the ARM architecture. If the flag is set to 1 but no value is assigned to data, the assertion in thread1 will be triggered.

With reference to the platform-related macro definitions in the Linux kernel code, the code is as follows:

// x86
#define barrier() __asm__ __volatile__("": : :"memory")
#define smp_rmb() barrier()
#define smp_wmb() barrier()
#define smp_mb() asm volatile("lock; addl $0,-132(%%rsp)" ::: "memory", "cc")
// arm
#define smp_mb()  asm volatile("dmb ish" ::: "memory")
#define smp_wmb() asm volatile("dmb ishst" ::: "memory")
#define smp_rmb() asm volatile("dmb ishld" ::: "memory")

In the x86 architecture, the smp_rmb() and smp_wmb() macros of the read and write barriers are set as the compiler memory barriers. In the ARM architecture, they are set as the CPU instruction-level memory barriers. To prevent the assertion in thread1 from being triggered, you need to change the compilation barrier in the original code to the CPU-level memory barrier as follows:

#define smp_wmb() asm volatile("dmb ishst" ::: "memory")
#define smp_rmb() asm volatile("dmb ishld" ::: "memory")
// init: flag = data = 0;
thread0(void) {
    data = 1; 
    smp_wmb();
    flag = 1; 
}
thread1(void) { 
    if (flag != 1) 
       return;
    smp_rmb();
    assert(data == 1);
}

Parent topic: Code Porting Precautions