我要评分
获取效率
正确性
完整性
易理解

Affinity and Core Binding

Affinity is the tendency of a process to run on a specified CPU as long as possible without being scheduled to other processors. On a multi-core server, each CPU has its own cache that stores the information used by processes. A process may be scheduled by the OS to other CPUs. As a result, the CPU cache hit ratio is low. After the CPU is bound, the process keeps running on the specified CPU, and the OS does not schedule the process to other CPUs. In this way, the CPU cache hit ratio is greatly improved, and thus the performance is improved.

  1. Run the shell command to bind the running task to a NUMA node and CPU.

    The numactl command is a manual optimization command provided by Linux. It can be used to specify a process to run on a NUMA node or a specific CPU core.

    1. Bind a NUMA node: numactl --cpubind=0 --membind=0 java SIMDTest_Compare_Max2
    2. Bind a CPU core: numactl -C 0-19 --membind=0 java SIMDTest_Compare_Max2
    3. Run the top command to check whether CPU core binding is successful. The output also shows the process to which the core is bound.
  2. Specify a CPU core by calling the system APIs in the program code.

    The sched_getaffinity interface in the glibc library is used to obtain the current CPU affinity of the application. The sched_setaffinity interface can be used to bind the application to one or more CPU cores.

    The API syntax is as follows:
    #include <sched.h>
    int sched_setaffinity(pid_t pid, unsigned int cpusetsize, cpu_set_t *mask);
    int sched_getaffinity(pid_t pid, unsigned int cpusetsize, cpu_set_t *mask);
    void CPU_CLR(int cpu, cpu_set_t *set);
    int CPU_ISSET(int cpu, cpu_set_t *set);
    void CPU_SET(int cpu, cpu_set_t *set);
    void CPU_ZERO(cpu_set_t *set);

    The following is an example of core binding:

    #include <sched.h>
    cpu_set_t cpu_mask;
    memset((VOS_VOID *)(&cpu_mask), 0 , sizeof(cpu_mask));
    cpu_mask.__bits[0] = 1 << 0;
    (VOS_VOID)sched_setaffinity(0, sizeof(cpu_mask), &cpu_mask);
  3. Bind application software to cores.

    The Kunpeng 920 processor provides two super CPU clusters (SCCLs). Each SCCL contains six to eight CPU clusters, and each CPU cluster contains four cores. When binding CPUs to a KVM, you are advised to use CPUs across multiple CPU clusters to improve the KVM performance. This method can reduce bandwidth bottlenecks between the L3 cache and memory caused by core contention in the same CPU cluster.

    1. Query the NUMA node information and topology in the Linux system.
      numactl -H

    2. Edit the VM XML configuration file in the Linux system and bind vCPUs to cores in as many CPU clusters as possible. The following is an example:
      <domain type = 'KVM'>
      ...
        <vcpu placement = 'static' cpuset='0,1,4,5,8,9,12,13'>8</vcpu>
        <cputune>
          <vcpupin vcpu='0' cpuset='0'/>
          <vcpupin vcpu='1' cpuset='1'/>
          <vcpupin vcpu='2' cpuset='4'/>
          <vcpupin vcpu='3' cpuset='5'/>
          <vcpupin vcpu='4' cpuset='8'/>
          <vcpupin vcpu='5' cpuset='9'/>
          <vcpupin vcpu='6' cpuset='12'/>
          <vcpupin vcpu='7' cpuset='13'/>
          </cputune>
      ...
      </domain>

      (0, 1), (4, 5), (8, 9), and (12, 13) are in different CPU clusters.