Rate This Document
Findability
Accuracy
Completeness
Readability

System Tuning

Optimizing the OS Configuration

  • Purpose

    Adjust the system configuration to maximize the hardware performance.

  • Procedure
    Table 1 lists the optimization items.
    Table 1 OS configuration parameters

    Parameter

    Description

    Suggestion

    Configuration Method

    vm.swappiness

    The swap partition is the virtual memory of the system. Do not use the swap partition because it will deteriorate system performance.

    Default value: 60

    Symptom: The performance deteriorates significantly when the swap partition is used.

    Suggestion: Disable the swap partition and set this parameter to 0.

    Run the following command:

    1
    sudo sysctl vm.swappiness=0
    

    MTU

    Maximum size of a data packet that can pass through a NIC. After the value is increased, the number of network packets can be reduced and the efficiency can be improved.

    Default value: 1500 bytes

    Symptom: Run the ip addr command to view the value.

    Suggestion: Set the maximum size of a data packet that can pass through a NIC to 9000 bytes.

    1. Run the following command:
      1
      vim /etc/sysconfig/network-scripts/ifcfg-$(Interface)
      
      Add MTU="9000".
      NOTE:

      ${Interface} indicates the network port name.

    2. After the configuration is complete, restart the network service.
      1
      service network restart
      

    pid_max

    The default value of pid_max is 32768, which is sufficient in normal cases. However, when heavy workloads are being processed, this value is insufficient and may cause memory allocation failure.

    Default value: 32768

    Symptom: Run the cat /proc/sys/kernel/pid_max command to view the value.

    Suggestion: Set the maximum number of threads that can be generated in the system to 4194303.

    Run the following command:

    1
    echo 4194303 > /proc/sys/kernel/pid_max
    

    file_max

    Maximum number of files that can be opened by all processes in the system. In addition, some programs can call the setrlimit interface to set the limit on each process. If the system generates a large number of errors indicating that file handles are used up, increase the value of this parameter.

    Default value: 13291808

    Symptom: Run the cat /proc/sys/fs/file-max command to view the value.

    Suggestion: Set the maximum number of files that can be opened by all processes in the system to the value displayed after the cat /proc/meminfo | grep MemTotal | awk '{print $2}' command is run.

    Run the following command:

    1
    echo ${file-max} > /proc/sys/fs/file-max
    
    NOTE:

    ${file-max} is the value displayed after the cat /proc/meminfo | grep MemTotal | awk '{print $2}' is run.

    read_ahead

    Linux readahead means that the Linux kernel prefetches a certain area of ​​the specified file and loads it into the page cache. As a result, when the area is accessed subsequently, block caused by page fault will not occur.

    Reading data from memory is much faster than reading data from drives. Therefore, the readahead feature can effectively reduce the number of drive seeks and the I/O waiting time of the applications. It is one of the important methods for optimizing the drive read I/O performance.

    Default value: 128 KB

    Symptom: Readahead can effectively reduce the number of drive seeks and the I/O waiting time of the applications. Run the /sbin/blockdev --getra /dev/sdb to view the value.

    Suggestion: Change the value to 8192 KB. Improve the drive read efficiency by pre-reading and recording the data to random access memory (RAM).

    Run the following command:

    1
    /sbin/blockdev --setra /dev/sdb
    
    NOTE:

    /dev/sdb is used as an example. You need to modify this parameter for all data drives.

    I/O_Scheduler

    The Linux I/O scheduler is a component of the Linux kernel. You can adjust the scheduler to optimize system performance.

    Default value: CFQ

    Symptom: The Linux I/O scheduler needs to be configured based on different storage devices for the optimal system performance.

    Suggestion: Set the I/O scheduling policy to deadline for HDDs and noop for SSDs.

    Run the following command:

    1
    echo deadline > /sys/block/sdb/queue/scheduler
    
    NOTE:

    /dev/sdb is used as an example. You need to modify this parameter for all data drives.

    nr_requests

    If the Linux system receives a large number of read requests, the default number of request queues may be insufficient. To deal with this problem, you can dynamically adjust the default number of request queues in the /sys/block/hda/queue/nr_requests file.

    Default value: 128

    Symptom: Increase the drive throughput by adjusting the nr_requests parameter.

    Suggestion: Set the number of drive request queues to 512.

    Run the following command:

    1
    echo 512 > /sys/block/sdb/queue/nr_requests
    
    NOTE:

    /dev/sdb is used as an example. You need to modify this parameter for all data drives.

Optimizing the Network Performance

  • Purpose

    This test uses the 25GE Ethernet adapter (Hi1822) with four ports, SFP+. It is used as an example to describe how to optimize the NIC parameters for the optimal performance.

  • Procedure

    The optimization methods include adjusting NIC parameters and interrupt-core binding (binding interrupts to the physical CPU of the NIC). Table 2 describes the optimization items.

    Table 2 NIC parameters

    Parameter

    Description

    Suggestion

    irqbalance

    System interrupt balancing service, which automatically allocates NIC software interrupts to idle CPUs.

    Default value: active

    Symptom: When this function is enabled, the system automatically allocates NIC software interrupts to idle CPUs.

    Suggestion:

    • To disable irqbalance, set this parameter to inactive.
      1
      systemctl stop irqbalance
      
    • Keep the function disabled after the server is restarted.
      1
      systemctl disable irqbalance
      

    rx_buff

    Aggregation of large network packets requires multiple discontinuous memory pages and causes low memory usage. You can increase the value of this parameter to improve the memory usage.

    Default value: 2

    Symptom: When the value is set to 2 by default, interrupts consume a large number of CPU resources.

    Suggestion: Load the rx_buff parameter and set the value to 8 to reduce discontinuous memory and improve memory usage and performance. For details, see description following the table.

    ring_buffer

    You can increase the throughput by adjusting the NIC buffer size.

    Default value: 1024

    Symptom: Run the ethtool -g NIC name command to view the value.

    Suggestion: Change the ring_buffer queue size to 4096. For details, see description following the table.

    lro

    Large receive offload. After this function is enabled, multiple small packets are aggregated into one large packet for better efficiency.

    Default value: off

    Symptom: After this function is enabled, the maximum throughput increases significantly.

    Suggestion: Enable the large-receive-offload function to help networks improve the efficiency of sending and receiving packets. For details, see description following the table.

    hinicadm_lro_-i hinic0_-t_<NUM>

    Received aggregated packets are sent after the time specified by NUM (in microseconds). You can set the value to 256 microseconds for better efficiency.

    Default value: 16 microseconds

    Symptom: This parameter is used with the LRO function.

    Suggestion: Change the value to 256 microseconds.

    hinicadm_lro_-i_hinic0_-n_<NUM>

    Received aggregated packets are sent after the number of aggregated packets reaches the value specified by <NUM>. You can set the value to 32 for better efficiency.

    Default value: 4

    Symptom: This parameter is used with the LRO function.

    Suggestion: Change the value to 32.

    • Adjusting rx_buff
      1. Go to the /etc/modprobe.d directory.
        1
        cd /etc/modprobe.d
        
      2. Create the hinic.conf file.
        1
        vim /etc/modprobe.d/hinic.conf
        
        Add the following information to the file:
        1
        options hinic rx_buff=8
        
      3. Reload the driver.
        1
        2
        rmmod hinic
        modprobe hinic
        
      4. Check whether the value of rx_buff is changed to 8.
        1
        cat /sys/bus/pci/drivers/hinic/module/parameters/rx_buff
        
    • Adjusting ring_buffer
      1. Change the buffer size from the default value 1024 to 4096.
        1
        ethtool -G <NIC name> rx 4096 tx 4096
        
      2. Check the current buffer size.
        1
        ethtool -g <NIC name>
        
    • Enabling LRO
      1. Enable the LRO function for a NIC.
        1
        ethtool -K <NIC name> lro on
        
      2. Check whether the function is enabled.
        1
        ethtool -k <NIC name> | grep large-receive-offload
        

    In addition to optimizing the preceding parameters, you need to bind the NIC software interrupts to the cores.

    1. Disable the irqbalance service.
    2. Query the NUMA node to which the NIC belongs:
      1
      cat /sys/class/net/<Network port name>/device/numa_node
      
    3. Query the CPU cores that correspond to the NUMA node.
      1
      lscpu
      
    4. Query the interrupt ID corresponding to the NIC.
      1
      cat /proc/interrupts | grep <Network port name> | awk -F ':' '{print $1}'
      
    5. Bind the software interrupt to the core corresponding to the NUMA node.
      1
      echo <core number> > /proc/irq/ <Interrupt ID> /smp_affinity_list