System Tuning
Optimizing the OS Configuration
- Purpose
Adjust the system configuration to maximize the hardware performance.
- ProcedureTable 1 lists the optimization items.
Table 1 OS configuration parameters Parameter
Description
Suggestion
Configuration Method
vm.swappiness
The swap partition is the virtual memory of the system. Do not use the swap partition because it will deteriorate system performance.
Default value: 60
Symptom: The performance deteriorates significantly when the swap partition is used.
Suggestion: Disable the swap partition and set this parameter to 0.
Run the following command:
1sudo sysctl vm.swappiness=0
MTU
Maximum size of a data packet that can pass through a NIC. After the value is increased, the number of network packets can be reduced and the efficiency can be improved.
Default value: 1500 bytes
Symptom: Run the ip addr command to view the value.
Suggestion: Set the maximum size of a data packet that can pass through a NIC to 9000 bytes.
- Run the following command:
1vim /etc/sysconfig/network-scripts/ifcfg-$(Interface)
Add MTU="9000".NOTE:${Interface} indicates the network port name.
- After the configuration is complete, restart the network service.
1service network restart
pid_max
The default value of pid_max is 32768, which is sufficient in normal cases. However, when heavy workloads are being processed, this value is insufficient and may cause memory allocation failure.
Default value: 32768
Symptom: Run the cat /proc/sys/kernel/pid_max command to view the value.
Suggestion: Set the maximum number of threads that can be generated in the system to 4194303.
Run the following command:
1echo 4194303 > /proc/sys/kernel/pid_max
file_max
Maximum number of files that can be opened by all processes in the system. In addition, some programs can call the setrlimit interface to set the limit on each process. If the system generates a large number of errors indicating that file handles are used up, increase the value of this parameter.
Default value: 13291808
Symptom: Run the cat /proc/sys/fs/file-max command to view the value.
Suggestion: Set the maximum number of files that can be opened by all processes in the system to the value displayed after the cat /proc/meminfo | grep MemTotal | awk '{print $2}' command is run.
Run the following command:
1echo ${file-max} > /proc/sys/fs/file-max
NOTE:${file-max} is the value displayed after the cat /proc/meminfo | grep MemTotal | awk '{print $2}' is run.
read_ahead
Linux readahead means that the Linux kernel prefetches a certain area of the specified file and loads it into the page cache. As a result, when the area is accessed subsequently, block caused by page fault will not occur.
Reading data from memory is much faster than reading data from drives. Therefore, the readahead feature can effectively reduce the number of drive seeks and the I/O waiting time of the applications. It is one of the important methods for optimizing the drive read I/O performance.
Default value: 128 KB
Symptom: Readahead can effectively reduce the number of drive seeks and the I/O waiting time of the applications. Run the /sbin/blockdev --getra /dev/sdb to view the value.
Suggestion: Change the value to 8192 KB. Improve the drive read efficiency by pre-reading and recording the data to random access memory (RAM).
Run the following command:
1/sbin/blockdev --setra /dev/sdb
NOTE:/dev/sdb is used as an example. You need to modify this parameter for all data drives.
I/O_Scheduler
The Linux I/O scheduler is a component of the Linux kernel. You can adjust the scheduler to optimize system performance.
Default value: CFQ
Symptom: The Linux I/O scheduler needs to be configured based on different storage devices for the optimal system performance.
Suggestion: Set the I/O scheduling policy to deadline for HDDs and noop for SSDs.
Run the following command:
1echo deadline > /sys/block/sdb/queue/scheduler
NOTE:/dev/sdb is used as an example. You need to modify this parameter for all data drives.
nr_requests
If the Linux system receives a large number of read requests, the default number of request queues may be insufficient. To deal with this problem, you can dynamically adjust the default number of request queues in the /sys/block/hda/queue/nr_requests file.
Default value: 128
Symptom: Increase the drive throughput by adjusting the nr_requests parameter.
Suggestion: Set the number of drive request queues to 512.
Run the following command:
1echo 512 > /sys/block/sdb/queue/nr_requests
NOTE:/dev/sdb is used as an example. You need to modify this parameter for all data drives.
- Run the following command:
Optimizing the Network Performance
- Purpose
This test uses the 25GE Ethernet adapter (Hi1822) with four ports, SFP+. It is used as an example to describe how to optimize the NIC parameters for the optimal performance.
- Procedure
The optimization methods include adjusting NIC parameters and interrupt-core binding (binding interrupts to the physical CPU of the NIC). Table 2 describes the optimization items.
Table 2 NIC parameters Parameter
Description
Suggestion
irqbalance
System interrupt balancing service, which automatically allocates NIC software interrupts to idle CPUs.
Default value: active
Symptom: When this function is enabled, the system automatically allocates NIC software interrupts to idle CPUs.
Suggestion:
- To disable irqbalance, set this parameter to inactive.
1systemctl stop irqbalance
- Keep the function disabled after the server is restarted.
1systemctl disable irqbalance
rx_buff
Aggregation of large network packets requires multiple discontinuous memory pages and causes low memory usage. You can increase the value of this parameter to improve the memory usage.
Default value: 2
Symptom: When the value is set to 2 by default, interrupts consume a large number of CPU resources.
Suggestion: Load the rx_buff parameter and set the value to 8 to reduce discontinuous memory and improve memory usage and performance. For details, see description following the table.
ring_buffer
You can increase the throughput by adjusting the NIC buffer size.
Default value: 1024
Symptom: Run the ethtool -g NIC name command to view the value.
Suggestion: Change the ring_buffer queue size to 4096. For details, see description following the table.
lro
Large receive offload. After this function is enabled, multiple small packets are aggregated into one large packet for better efficiency.
Default value: off
Symptom: After this function is enabled, the maximum throughput increases significantly.
Suggestion: Enable the large-receive-offload function to help networks improve the efficiency of sending and receiving packets. For details, see description following the table.
hinicadm_lro_-i hinic0_-t_<NUM>
Received aggregated packets are sent after the time specified by NUM (in microseconds). You can set the value to 256 microseconds for better efficiency.
Default value: 16 microseconds
Symptom: This parameter is used with the LRO function.
Suggestion: Change the value to 256 microseconds.
hinicadm_lro_-i_hinic0_-n_<NUM>
Received aggregated packets are sent after the number of aggregated packets reaches the value specified by <NUM>. You can set the value to 32 for better efficiency.
Default value: 4
Symptom: This parameter is used with the LRO function.
Suggestion: Change the value to 32.
- Adjusting rx_buff
- Go to the /etc/modprobe.d directory.
1cd /etc/modprobe.d
- Create the hinic.conf file.
1vim /etc/modprobe.d/hinic.confAdd the following information to the file:1options hinic rx_buff=8
- Reload the driver.
1 2
rmmod hinic modprobe hinic
- Check whether the value of rx_buff is changed to 8.
1cat /sys/bus/pci/drivers/hinic/module/parameters/rx_buff
- Go to the /etc/modprobe.d directory.
- Adjusting ring_buffer
- Change the buffer size from the default value 1024 to 4096.
1ethtool -G <NIC name> rx 4096 tx 4096
- Check the current buffer size.
1ethtool -g <NIC name>
- Change the buffer size from the default value 1024 to 4096.
- Enabling LRO
- Enable the LRO function for a NIC.
1ethtool -K <NIC name> lro on
- Check whether the function is enabled.
1ethtool -k <NIC name> | grep large-receive-offload
- Enable the LRO function for a NIC.
In addition to optimizing the preceding parameters, you need to bind the NIC software interrupts to the cores.
- Disable the irqbalance service.
- Query the
NUMA node to which the NIC belongs:1cat /sys/class/net/<Network port name>/device/numa_node
- Query the CPU cores that correspond to the NUMA node.
1lscpu
- Query the interrupt ID corresponding to the NIC.
1cat /proc/interrupts | grep <Network port name> | awk -F ':' '{print $1}'
- Bind the software interrupt to the core corresponding to the NUMA node.
1echo <core number> > /proc/irq/ <Interrupt ID> /smp_affinity_list
- To disable irqbalance, set this parameter to inactive.