Optimizing NUMA to Reduce Cross-NUMA Memory Access

Principles

As described in Kunpeng Processor NUMA Overview, the performance differs when CPU cores in different NUMA nodes access the same memory. Memory access latency in descending order: cross-CPU > cross-NUMA and intra-CPU > intra-NUMA

Therefore, when applications are running, you need to avoid cross-NUMA memory access. You can set the CPU affinity of threads to prevent cross-NUMA memory access.

Modification Method

The network can bind the running CPU core in the following way. $cpuNumber indicates the core ID, which starts from 0. $irq is the interrupt ID of the NIC queue.
1
echo $cpuNumber > /proc/irq/$irq/smp_affinity_list
Run the numactl command to start the program. For example, the following command is used to start the test program, which can be run only on CPU cores 28 to 31 (controlled by -C).
1
numactl -C 28-31 ./test
In C/C++ code, the sched_setaffinity function is used to set thread affinity.
Many open source software supports thread affinity modification in the built-in configuration file. For example, you can modify the worker_cpu_affinity parameter in the nginx.conf file to set the Nginx thread affinity.

Parent topic: Optimization Methods