CPU Core Binding Optimization
In a virtualization scenario with two NUMA nodes, optimize VM performance by adjusting the binding of vCPUs to physical machine dies and configuring the vCPU topology.
When a physical machine is configured with two NUMA nodes, each node typically contains two CPU dies. Due to performance overheads associated with cache and memory access between these dies, you are advised to prevent the physical CPU cores bound to VM vCPUs from spanning across different dies. For example, Kunpeng 920 7270Z has 64 CPU cores per die. As shown below, in a two-NUMA node setup, CPUs 0 to 127 constitute NUMA node 0. This NUMA node 0 is further divided into two CPU dies: the first die includes cores 0 to 63, and the second die includes cores 64 to 127.

If the VM has fewer than 64 vCPUs, the optimal configuration is that all vCPUs are bound to the physical CPU cores of the first or second die. Do not bind vCPUs to the physical CPU cores across two dies.
The following is an example of CPU core binding optimization. In this example, the cputune section establishes a 1:1 binding between vCPUs and physical CPU cores, and all physical CPU cores are on the same CPU die. In the numatune section, only one NUMA node is set for the VM, and nodeset points to the NUMA node where the physical CPU core is located. In the cpu section, configure one socket, one die, four clusters, with each cluster containing four vCPUs and a thread count of two per core. Modify the VM XML file by referring to Cluster Optimization Configuration for Four-NUMA Node Scenarios.
This document uses a VM with 32 vCPUs and 64 GB memory as an example to describe how to optimize CPU core binding. Adjust the parameters based on requirements and VM specifications.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
<domain type = 'KVM'> ... <vcpu placement='static'>32</vcpu> <cputune> <vcpupin vcpu='0' cpuset='64'/> <vcpupin vcpu='1' cpuset='65'/> <vcpupin vcpu='2' cpuset='66'/> <vcpupin vcpu='3' cpuset='67'/> <vcpupin vcpu='4' cpuset='68'/> <vcpupin vcpu='5' cpuset='69'/> <vcpupin vcpu='6' cpuset='70'/> <vcpupin vcpu='7' cpuset='71'/> <vcpupin vcpu='8' cpuset='72'/> <vcpupin vcpu='9' cpuset='73'/> <vcpupin vcpu='10' cpuset='74'/> <vcpupin vcpu='11' cpuset='75'/> <vcpupin vcpu='12' cpuset='76'/> <vcpupin vcpu='13' cpuset='77'/> <vcpupin vcpu='14' cpuset='78'/> <vcpupin vcpu='15' cpuset='79'/> <vcpupin vcpu='16' cpuset='80'/> <vcpupin vcpu='17' cpuset='81'/> <vcpupin vcpu='18' cpuset='82'/> <vcpupin vcpu='19' cpuset='83'/> <vcpupin vcpu='20' cpuset='84'/> <vcpupin vcpu='21' cpuset='85'/> <vcpupin vcpu='22' cpuset='86'/> <vcpupin vcpu='23' cpuset='87'/> <vcpupin vcpu='24' cpuset='88'/> <vcpupin vcpu='25' cpuset='89'/> <vcpupin vcpu='26' cpuset='90'/> <vcpupin vcpu='27' cpuset='91'/> <vcpupin vcpu='28' cpuset='92'/> <vcpupin vcpu='29' cpuset='93'/> <vcpupin vcpu='30' cpuset='94'/> <vcpupin vcpu='31' cpuset='95'/> <emulatorpin cpuset='64-95'/> </cputune> ... <numatune> <memnode cellid='0' mode='strict' nodeset='0'/> </numatune> ... <cpu mode='host-passthrough' check='none'> <topology sockets='1' dies='1' clusters='4' cores='4' threads='2'/> ... </cpu> ... <domain> |