Overview
The system performance varies according to application scenarios. In computing-intensive scenarios, performance optimization aims to enable CPUs to process more data per unit time. In scenarios with a large number of database accesses, performance optimization aims to improve I/O efficiency, reduce I/O waits, and allow more data to be read and written per unit time. In scenarios with heavy network traffic, performance optimization aims to improve the network throughput and reduce the delay, that is, increase the quantity of data packets sent or received per unit time. Although the objectives in different scenarios are different, they are all to improve a certain capability of the system per unit time. For the CPU, performance optimization is reflected in the instruction execution efficiency of the CPU. The objective of performance optimization is to increase the number of instructions executed by the CPU per unit time, that is, instructions per cycle (IPC).
When an instruction is executed in a CPU, the instruction passes through many CPU components, and each component performs its own function and is functionally independent of each other. However, a downstream component generally depends on the execution result of an upstream component. Such flow is similar to a production line. Therefore, the entire process of instruction execution is called an instruction pipeline. The number of levels (lengths) of CPU pipelines in different architectures varies greatly, ranging from several levels to dozens of levels. A pipeline with more levels indicates a complex CPU structure, more powerful functions, and high power consumption. Conversely, a pipeline with fewer levels indicates a simple CPU structure and low power consumption. The following table lists some typical ARM pipeline levels.
Model |
Instruction Set |
Pipeline Levels |
|---|---|---|
ARM 7 |
Armv4 |
3 |
ARM 9 |
Armv5 |
5 |
ARM 11 |
Armv6 |
8 |
Cortex-A8 |
Armv7-A |
13 |
Kunpeng 920/Cortex-A55 |
Armv8 |
8 |