Rate This Document
Findability
Accuracy
Completeness
Readability

OS Performance Optimization

OS performance optimization is particularly beneficial for scenarios involving frequent memory operations with significant TLB misses and page faults. Key optimization methods include implementing huge page memory pools, adopting high-performance memory allocation libraries, configuring malloc to use huge pages, and enabling huge page support for glibc dynamic libraries (where system support exists). Generally, combining multiple optimization methods is discouraged to prevent unnecessary complexity and potential compatibility issues. However, in openEuler 22.03 LTS SP4 environments, leveraging the huge page feature of glibc alongside other optimizations can yield additional performance gains.

Huge Page Memory Pool

Memory-intensive applications running on Linux often suffer performance degradation due to the default 4 KB page size, which generates excessive TLB misses and page faults. Adopting huge pages significantly reduces these inefficiencies, delivering measurable performance gains.

Two huge page variants exist. Standard huge pages cater to performance-sensitive workloads needing precise memory control, while transparent huge pages offer automated optimization for general applications requiring minimal configuration. The optimal selection depends on specific operational needs.

  • Standard huge pages

    Temporarily enable the standard huge page pool.

    This method is recommended because you do not need to restart the server.
    sysctl -w vm.nr_hugepages=1024

    Run the following command to check whether huge pages are successfully enabled:

    cat /proc/meminfo | grep -i huge

    The following figure shows the effect after huge pages are successfully enabled.

    The value of vm.nr_hugepages is the number of 2 MB standard huge pages. Allocating standard huge pages will reduce the OS memory. You are advised to set the value as required.

  • Transparent huge pages
    Run the following command:
    cat /sys/kernel/mm/transparent_hugepage/enabled

    Check whether transparent huge pages are enabled. If [always] is displayed in the command output, transparent huge pages are enabled. Generally, transparent huge pages are enabled by default.

    If huge pages are not enabled, run the following command to enable them:
    echo always > /sys/kernel/mm/transparent_hugepage/enabled

High-Performance Memory Library

  1. Obtain the jemalloc source code from GitHub.
    git clone https://github.com/jemalloc/jemalloc.git
  2. Go to the directory and run the following command (if dependencies such as autoconf are missing, install the dependencies):
    ./autogen.sh
    make
    make install
  3. Add jemalloc to PATH. <path> is the actual path of jemalloc.
    export PATH="<path>/bin:${PATH}"
  4. Use the following method to enable jemalloc for the current command:
    LD_PRELOAD=`jemalloc-config --libdir`/libjemalloc.so.`jemalloc-config --revision` python3 app.py

Huge Pages for malloc

Requirement: The glibc version must be 2.34 or later.

Choose one of the following options to enable huge pages for the malloc function of glibc. The first option enables transparent huge pages; the second enables standard huge pages.

  • Use transparent huge pages.
    export GLIBC_TUNABLES=glibc.malloc.hugetlb=1
  • Use standard huge pages.
    export GLIBC_TUNABLES=glibc.malloc.hugetlb=2

If the message "Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit." is displayed when standard huge pages are used during model training, try transparent huge pages instead.

Huge pages for glibc dynamic libraries

The glibc dynamic library loading solution provided by openEuler maps huge pages by default, reducing memory overhead and accelerating process startup speed. This approach reduces iTLB cache misses, thereby improving performance. Check whether glibc and glibc-devel are installed and whether the versions meet the requirements.

The glibc version must be later than openEuler sp4 update 2.34-161 (depending on openEuler 22.03 LTS SP4).

The LD_HUGEPAGE_LIB environment variable enables all dependent dynamic libraries of an executable program to map huge pages. The variable accepts the following values to configure the huge page mode:

  • 2 enables transparent huge pages for dynamic libraries.
  • 1 enables standard huge pages for dynamic libraries.
  • 0 disables huge pages for dynamic libraries.

The environment variable is not propagated to child processes. For proper functionality, configure it at the program entry point. Settings in external shell scripts (outside the entry point) will not take effect.

Example:

export LD_HUGEPAGE_LIB=1
torchrun --nnodes=1 --nproc_per_node=8 --master-port 61888 scripts/train.py \
configs/opensora-v1-1/train/stage1.py