Tuning Overview
Tuning Process Flow
The performance tuning roadmap is as follows:
- If the CPU usage is low, resources are not fully used. You can use a tool (such as strace) to check where the application is blocked. Generally, the application is blocked by drives or networks, or the service logic of the application sleeps or waits for signals. These optimization measures are described in other sections.
- If the CPU usage is high, you can select better hardware and optimize hardware configuration parameters to adapt to service scenarios, or optimize software to reduce the CPU usage.
Configure DIMMs based on the CPU capability. You are advised to configure DIMMs in full channel configuration to maximize the memory bandwidth. One Kunpeng 920 processor supports eight memory channels, and two Kunpeng 920 processors support 16 memory channels. You are advised to use high-frequency DIMMs to improve memory bandwidth. When the Kunpeng 920 is configured with one DIMM per channel (1DPC), the maximum memory frequency is 3200 MHz.
Main Optimization Parameters
Optimization Item |
Description |
Default Value |
When to Take Effect |
Kunpeng 916 |
Kunpeng 920 |
|---|---|---|---|---|---|
Optimizing NUMA configurations |
In the NUMA architecture, the access delay is shorter when the CPU core accesses the adjacent memory. Bind applications to a NUMA node to reduce performance deterioration caused by remote memory access. |
No core binding configurations by default |
Immediately |
Yes |
Yes |
Modifying the CPU prefetch configuration |
In data centralization scenarios, data to be accessed can be read to the CPU cache in advance to improve performance. If data is not centralized, the prefetch hit ratio is low and the memory bandwidth is wasted. |
On |
After the system restarts |
No |
Yes |
Adjusting the timer mechanism |
The nohz mechanism reduces unnecessary clock interrupts and CPU scheduling overheads. |
Different OSs have different default configurations. Euler: nohz = off |
After the system restarts |
Yes |
Yes |
Adjusting the memory page size to 64 KB |
A larger memory page size indicates that more memory is managed in each line of the TLB and a higher TLB hit rate, thereby reducing a quantity of memory access times. |
Different OSs have different default configurations. 4 KB or 64 KB |
After the kernel is recompiled and updated |
Yes |
Yes |
Optimizing the number of concurrent application threads |
Properly adjust the number of concurrent threads of applications to balance multi-core capability utilization and resource contention. |
Determined by applications |
Immediately or after the system restarts (determined by applications) |
Yes |
Yes |