System-Level Optimization

Jason Evans malloc (jemalloc) is a high-performance and general-purpose memory allocator. To improve the performance of TensorFlow Serving in high-concurrency inference scenarios, jemalloc is introduced to manage memory more efficiently, reduce lock contention and mitigate fragmentation. This leads to a lower variance in memory usage and higher throughput and stability for inference requests.

Obtain the jemalloc source archive and decompress it.

wget https://github.com/jemalloc/jemalloc/archive/refs/tags/5.3.0.tar.gz --no-check-certificate
tar zxvf 5.3.0.tar.gz

Go to the installation directory.
1
cd jemalloc-5.3.0/

Compile and install jemalloc.

./autogen.sh
./configure
make -j
make install

Verify the installation.
1
ll /usr/local/lib/libjemalloc*
The installation is successful if the following information is displayed:
jemalloc can be enabled by setting the LD_PRELOAD environment variable and the MALLOC_CONF environment variable is used to configure the memory manager's behavior. This document provides the enablement commands and the optimal configurations for the Kunpeng platform.
```
export LD_PRELOAD="/usr/local/lib/libjemalloc.so"
export MALLOC_CONF="background_thread:true,metadata_thp:auto,dirty_decay_ms:20000,muzzy_decay_ms:20000"
```

Parent topic: Kunpeng TensorFlow Serving Best Practices