NUMA Affinity
By default, containers can access CPUs of a host at any interval. Most users use the default completely fair scheduler (CFS). When multiple containers run on a server, the services in different containers may be different. You can bind containers to CPUs and configure the NUMA affinity to maximize the container performance based on the application scenario.
1:1 CPU Binding and Same-Die Memory Access
vCPUs can be bound to CPUs in the same processor or CPUs in the same NUMA node. Avoid cross-die and cross-chip memory access of a Docker container to prevent performance deterioration. By default, vCPUs of different containers may run on the same physical CPU, which causes CPU resource competition and frequent VMID changes. As a result, L1 TLB flushing frequently occurs and the TLB miss rate is high, causing performance deterioration.
- Query the NUMA information.
1numactl -H
The preceding figure shows the CPU core distribution of the Kunpeng 920 5250 processor. CPU cores 0 to 23 are in NUMA 0, CPU cores 24 to 47 in NUMA 1, CPU cores 48 to 71 in NUMA 2, and CPU cores 72 to 95 in NUMA 3. When binding Docker container vCPUs to cores, avoid cross-die and cross-chip memory access to prevent performance deterioration.
- Bind each container vCPU to a CPU and assign memory in the same NUMA node to each vCPU.
The Kunpeng 920 5250 processor is used as an example. Create a container named 4u8g_01, bind CPUs 4 to 7 of NUMA 0 to the container, allocate 8 GB memory, use the centos:latest image, mount a local volume, and map /home on the local host to /home on the container.
1docker run -d -it --cpus=4 --cpuset-cpus=4-7 --cpuset-mems=0 -m 8192m --name 4u8g_01 -v /home:/home centos:latest
For details about the docker run command, see https://docs.docker.com/engine/reference/commandline/run/.
Command format: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
Parameters in the command are described as follows:
- -d: runs the container in the background and prints the container ID.
- -i: enables the STDIN even if no attach operation is performed.
- -t: allocates a pseudo-TTY.
- --cpus: specifies the number of CPUs to be allocated.
- --cpuset-cpus: specifies the CPUs to be bound.
- --cpuset-mems: specifies the NUMA node.
- -m: specifies the memory size to be allocated.
- --name: specifies the Docker container name.
- -v: specifies the volume to be bound.
- centos:latest: specifies the local image. centos indicates the repository, and latest indicates the tag.
Cross-Cluster CPU Binding and Same-Die Memory Access
Kunpeng 920 series processors provides two super CPU clusters (SCCLs). Each SCCL contains six to eight CPU clusters, and each CPU cluster contains four CPUs. When binding CPUs to a Docker container, you are advised to use CPUs across multiple CPU clusters to improve the Docker container performance. This method can reduce bandwidth bottlenecks between the L3 cache and memory caused by CPU competition in the same CPU cluster.
Binding vCPUs of a Docker container to CPUs across multiple CPU clusters can bring the following benefits:
- When the load is light, the memory bandwidth can be fully utilized.
- Competition for L3 cache tags can be dramatically reduced and the memory bandwidth and CPU computing performance can be improved.
If the number of clusters is greater than the number of vCPUs of the container to be created, you can select any clusters.
Set cross-cluster CPU binding and same-NUMA memory access.
The Kunpeng 920 5250 processor is used as an example. Create a container named 8u16g_02, bind 8 CPU cores (3, 4, 8, 9, 12, 16, 20, and 21) in NUMA 0 to the container, allocate 16 GB memory to the container, set the image for container creation to centos:latest, and mount a local volume by mapping the local /home directory to the /home directory of the container.
1
|
docker run -d -it --cpus=8 --cpuset-cpus=3,4,8,9,12,16,20,21 --cpuset-mems=0 -m 16384m --name 8u16g_02 -v /home:/home centos:latest |
For details about the docker run command, see https://docs.docker.com/engine/reference/commandline/run/.
Command format: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
The parameters are described as follows:
- -d: runs the container in the background and prints the container ID.
- -i: enables the STDIN even if no attach operation is performed.
- -t: allocates a pseudo-TTY.
- --cpus: specifies the number of CPUs to be allocated.
- --cpuset-cpus: specifies the CPUs to be bound.
- --cpuset-mems: specifies the NUMA node.
- -m: specifies the memory size to be allocated.
- --name: specifies the Docker container name..
- -v: specifies the volume to be bound.
- centos:latest: specifies the local image. centos indicates the repository and latest indicates the tag.