Overview
Memory Barrier
The CPU running speed is much faster than that of the memory. Generally, the CPU can execute hundreds of instructions when obtaining a variable from the memory. Therefore, a cache is added between the CPU and the memory in the computer architecture to allow quick access to frequently used data in the cache. The CPU is designed to execute other instructions and memory references while obtaining data from memory, which results in out-of-order execution of instructions and memory references. To solve this problem, various synchronization primitives are introduced to use the memory barrier to ensure the same memory access sequence for multiple CPUs.
This section uses the C and C++ languages as examples to describe the basic principles and usage of the memory barrier. The Java language has used the JDK to adapt to the barrier primitives. For details about code calling and related knowledge, see Java Synchronization Primitives.
When to Use the Memory Barrier
The memory barrier is used only when two CPUs need to interact with each other through shared memory.
When Not to Use the Memory Barrier Explicitly
The definition of the memory sequence model varies with the architecture. The following describes typical scenarios where the memory barrier does not need to be used explicitly in the ARM architecture.
- When address dependency exists, the memory access sequence can be ensured without memory barrier.
When the assembly statement in the third line reads data from the memory address [X3, X1] to the register X4, the value read by the register X1 from the address [X2] is required. In this case, LDR X1, [X2] are executed before LDR X4, [X3, X1].
- When control dependency exists, the memory access sequence can be ensured without memory barrier.
r1 = x; if(r1 ==0) nop(); y =1;
If a conditional branch depends on a load operation, the store operations following the conditional branch are performed after the load operation. Therefore, r1 = x is executed before y = 1.
- When register data dependency exists, the memory access sequence can be ensured without memory barrier.
LDR X1,[X2] ADD X3, X3, X1 SUB X3, X3, X1 STR X4,[X3]
During the execution of the preceding statements, the last STR instruction is executed based on the memory address stored in the X3 register, and the value of the X3 register depends on the value of the X1 register obtained from the [X2] memory address. Therefore, LDR X1, [X2] is executed before STR X4, [X3].
