Rate This Document
Findability
Accuracy
Completeness
Readability

Tuning Overview

Tuning Process Flow

The performance tuning roadmap is as follows:

  • If the CPU usage is low, resources are not fully used. You can use a tool (such as strace) to check where the application is blocked. Generally, the application is blocked by drives or networks, or the service logic of the application sleeps or waits for signals. These optimization measures are described in other sections.
  • If the CPU usage is high, you can select better hardware and optimize hardware configuration parameters to adapt to service scenarios, or optimize software to reduce the CPU usage.

Configure DIMMs based on the CPU capability. You are advised to configure DIMMs in full channel configuration to maximize the memory bandwidth. One Kunpeng 920 processor supports eight memory channels, and two Kunpeng 920 processors support 16 memory channels. You are advised to use high-frequency DIMMs to improve memory bandwidth. When the Kunpeng 920 is configured with one DIMM per channel (1DPC), the maximum memory frequency is 3200 MHz.

Main Optimization Parameters

Optimization Item

Description

Default Value

When to Take Effect

Kunpeng 916

Kunpeng 920

Optimizing NUMA configurations

In the NUMA architecture, the access delay is shorter when the CPU core accesses the adjacent memory. Bind applications to a NUMA node to reduce performance deterioration caused by remote memory access.

No core binding configurations by default

Immediately

Yes

Yes

Modifying the CPU prefetch configuration

In data centralization scenarios, data to be accessed can be read to the CPU cache in advance to improve performance. If data is not centralized, the prefetch hit ratio is low and the memory bandwidth is wasted.

On

After the system restarts

No

Yes

Adjusting the timer mechanism

The nohz mechanism reduces unnecessary clock interrupts and CPU scheduling overheads.

Different OSs have different default configurations.

Euler: nohz = off

After the system restarts

Yes

Yes

Adjusting the memory page size to 64 KB

A larger memory page size indicates that more memory is managed in each line of the TLB and a higher TLB hit rate, thereby reducing a quantity of memory access times.

Different OSs have different default configurations.

4 KB or 64 KB

After the kernel is recompiled and updated

Yes

Yes

Optimizing the number of concurrent application threads

Properly adjust the number of concurrent threads of applications to balance multi-core capability utilization and resource contention.

Determined by applications

Immediately or after the system restarts (determined by applications)

Yes

Yes