System-Level Optimization
Jason Evans malloc (jemalloc) is a high-performance and general-purpose memory allocator. To improve the performance of TensorFlow Serving in high-concurrency inference scenarios, jemalloc is introduced to manage memory more efficiently, reduce lock contention and mitigate fragmentation. This leads to a lower variance in memory usage and higher throughput and stability for inference requests.
- Obtain the jemalloc source archive and decompress it.
1 2
wget https://github.com/jemalloc/jemalloc/archive/refs/tags/5.3.0.tar.gz --no-check-certificate tar zxvf 5.3.0.tar.gz
- Go to the installation directory.
1cd jemalloc-5.3.0/
- Compile and install jemalloc.
1 2 3 4
./autogen.sh ./configure make -j make install
- Verify the installation.
1ll /usr/local/lib/libjemalloc*The installation is successful if the following information is displayed:

- jemalloc can be enabled by setting the LD_PRELOAD environment variable and the MALLOC_CONF environment variable is used to configure the memory manager's behavior. This document provides the enablement commands and the optimal configurations for the Kunpeng platform.
export LD_PRELOAD="/usr/local/lib/libjemalloc.so" export MALLOC_CONF="background_thread:true,metadata_thp:auto,dirty_decay_ms:20000,muzzy_decay_ms:20000"
Parent topic: Kunpeng TensorFlow Serving Best Practices