Rate This Document
Findability
Accuracy
Completeness
Readability

NUMA Affinity

By default, containers can access CPUs of a host at any interval. Most users use the default completely fair scheduler (CFS). When multiple containers run on a server, the services in different containers may be different. You can bind containers to CPUs and configure the NUMA affinity to maximize the container performance based on the application scenario.

1:1 CPU Binding and Same-Die Memory Access

vCPUs can be bound to CPUs in the same processor or CPUs in the same NUMA node. Avoid cross-die and cross-chip memory access of a Docker container to prevent performance deterioration. By default, vCPUs of different containers may run on the same physical CPU, which causes CPU resource competition and frequent VMID changes. As a result, L1 TLB flushing frequently occurs and the TLB miss rate is high, causing performance deterioration.

  1. Query the NUMA information.
    1
    numactl -H
    

    The preceding figure shows the CPU core distribution of the Kunpeng 920 5250 processor. CPU cores 0 to 23 are in NUMA 0, CPU cores 24 to 47 in NUMA 1, CPU cores 48 to 71 in NUMA 2, and CPU cores 72 to 95 in NUMA 3. When binding Docker container vCPUs to cores, avoid cross-die and cross-chip memory access to prevent performance deterioration.

  2. Bind each container vCPU to a CPU and assign memory in the same NUMA node to each vCPU.

    The Kunpeng 920 5250 processor is used as an example. Create a container named 4u8g_01, bind CPUs 4 to 7 of NUMA 0 to the container, allocate 8 GB memory, use the centos:latest image, mount a local volume, and map /home on the local host to /home on the container.

    1
    docker run -d -it --cpus=4 --cpuset-cpus=4-7 --cpuset-mems=0 -m 8192m --name 4u8g_01 -v /home:/home centos:latest
    

    For details about the docker run command, see https://docs.docker.com/engine/reference/commandline/run/.

    Command format: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

    Parameters in the command are described as follows:

    • -d: runs the container in the background and prints the container ID.
    • -i: enables the STDIN even if no attach operation is performed.
    • -t: allocates a pseudo-TTY.
    • --cpus: specifies the number of CPUs to be allocated.
    • --cpuset-cpus: specifies the CPUs to be bound.
    • --cpuset-mems: specifies the NUMA node.
    • -m: specifies the memory size to be allocated.
    • --name: specifies the Docker container name.
    • -v: specifies the volume to be bound.
    • centos:latest: specifies the local image. centos indicates the repository, and latest indicates the tag.

Cross-Cluster CPU Binding and Same-Die Memory Access

Kunpeng 920 series processors provides two super CPU clusters (SCCLs). Each SCCL contains six to eight CPU clusters, and each CPU cluster contains four CPUs. When binding CPUs to a Docker container, you are advised to use CPUs across multiple CPU clusters to improve the Docker container performance. This method can reduce bandwidth bottlenecks between the L3 cache and memory caused by CPU competition in the same CPU cluster.

Binding vCPUs of a Docker container to CPUs across multiple CPU clusters can bring the following benefits:

  • When the load is light, the memory bandwidth can be fully utilized.
  • Competition for L3 cache tags can be dramatically reduced and the memory bandwidth and CPU computing performance can be improved.

If the number of clusters is greater than the number of vCPUs of the container to be created, you can select any clusters.

Set cross-cluster CPU binding and same-NUMA memory access.

The Kunpeng 920 5250 processor is used as an example. Create a container named 8u16g_02, bind 8 CPU cores (3, 4, 8, 9, 12, 16, 20, and 21) in NUMA 0 to the container, allocate 16 GB memory to the container, set the image for container creation to centos:latest, and mount a local volume by mapping the local /home directory to the /home directory of the container.

1
docker run -d -it --cpus=8 --cpuset-cpus=3,4,8,9,12,16,20,21 --cpuset-mems=0 -m 16384m --name 8u16g_02 -v /home:/home centos:latest

For details about the docker run command, see https://docs.docker.com/engine/reference/commandline/run/.

Command format: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

The parameters are described as follows:

  • -d: runs the container in the background and prints the container ID.
  • -i: enables the STDIN even if no attach operation is performed.
  • -t: allocates a pseudo-TTY.
  • --cpus: specifies the number of CPUs to be allocated.
  • --cpuset-cpus: specifies the CPUs to be bound.
  • --cpuset-mems: specifies the NUMA node.
  • -m: specifies the memory size to be allocated.
  • --name: specifies the Docker container name..
  • -v: specifies the volume to be bound.
  • centos:latest: specifies the local image. centos indicates the repository and latest indicates the tag.