Three-Level Pipeline
A three-level instruction pipeline includes three parts: instruction fetching, decoding, and execution. Instruction fetching and decoding can be completed in one cycle. However, the execution unit completes a large amount of work, including read/write operations on operand-related registers and memory, ALU operations, and data transmission between related components. As a result, the execution unit may need to occupy multiple clock cycles, which blocks execution of other instructions on the pipeline and becomes a performance bottleneck of the system.
From T1, one instruction is executed in each cycle, that is, IPC=1. This is the most efficient instruction execution mode. However, not all instructions are single-cycle instructions. For example, the LDR instruction for accessing the memory is a non-single-cycle instruction. This instruction interrupts the execution of the pipeline, as shown in Figure 2.
The LDR instruction is used to load data from the memory to a register. The execution of this instruction occupies three cycles from T3 to T5. At T3, the signal control cable is occupied to calculate the address of the memory. However, the decoding process also needs the cable. Therefore, at T3, the decoding operation of the first ADD cannot be performed, and the instruction fetch operation of the second ADD is not affected. At T4, the LDR instruction is accessing the memory, and the signal control cable is released. Therefore, the decoding operation of the first ADD is performed, and the decoding operation of the second ADD cannot be performed in this cycle. Because the von Neumann architecture is used, data and instructions share the same memory, and instructions cannot be read when the memory is accessed. Therefore, the instruction fetch operation of the MOV instruction is interrupted. At T5, data is fetched from the memory and stored in the register of the CPU. This operation occupies the execution unit. The first ADD instruction is interrupted, and the second ADD instruction enters the decoder for decoding. The instruction fetch operation of the MOV instruction is also performed.
Only one instruction is executed in the three cycles from T3 to T5. In general, six instructions are executed in eight cycles from T1 to T8, that is, IPC=0.75. According to the preceding instruction execution process, the multi-cycle LDR instruction occupies three clock cycles in the execution phase. As a result, the execution of subsequent instructions is blocked and the pipeline efficiency is hindered.
