Performance Tests
kunpeng-lzbench is a test framework based on lzbench. It calls algorithms such as zstd as dynamic libraries to compare the decompression performance of various compression algorithm libraries.
Compiling the Test Tool
Block Compression Test
This section describes how to call the open source zstd algorithm library and the KZstar algorithm library to test the block compression performance, and compares the performance metrics before and after KZstar is called.
Calling the open source zstd algorithm library to test the block compression performance
- Check the algorithm library used by the test tool.
ldd lzbench
If the following information is displayed, the open source zstd algorithm library is used:linux-vdso.so.1 (0x0000ffffae181000) libz.so.1 => /usr/lib64/libz.so.1 (0x0000ffffae113000) libzstd.so.1 => /usr/lib64/libzstd.so.1 (0x0000ffffae012000) liblz4.so.1 => /usr/lib64/liblz4.so.1 (0x0000ffffadfe1000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000ffffaddeb000) libm.so.6 => /usr/lib64/libm.so.6 (0x0000ffffadd4a000) libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x0000ffffadd19000) libc.so.6 => /usr/lib64/libc.so.6 (0x0000ffffadb6a000) /lib/ld-linux-aarch64.so.1 (0x0000ffffae144000)
- Call the open source zstd algorithm library to test the block compression performance. Set the compression level to level 3 and the block size to 128 KB.
./lzbench -ezstd,3 -b128 itemdata
The compression result is as follows:lzbench 1.8 (64-bit Linux) (null) Assembled by P.Skibinski Compressor name Compress. Decompress. Compr. size Ratio Filename memcpy 37655 MB/s 37041 MB/s 7316868 100.00 itemdata zstd 1.5.5 -3 195 MB/s 851 MB/s 2257863 30.86 itemdata
Calling the KZstar algorithm library to test the block compression performance
- Set the KZstar environment variable to enable KZstar.
export LD_LIBRARY_PATH=/usr/local/kzstar/lib:$LD_LIBRARY_PATH
- Check the algorithm library used by the test tool.
ldd lzbench
If the following information is displayed, the KZstar algorithm library is used:linux-vdso.so.1 (0x0000ffffa84cf000) libz.so.1 => /usr/lib64/libz.so.1 (0x0000ffffa8461000) libzstd.so.1 => /usr/local/kzstar/lib/libzstd.so.1 (0x0000ffffa8380000) liblz4.so.1 => /usr/lib64/liblz4.so.1 (0x0000ffffa834f000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000ffffa8159000) libm.so.6 => /usr/lib64/libm.so.6 (0x0000ffffa80b8000) libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x0000ffffa8087000) libc.so.6 => /usr/lib64/libc.so.6 (0x0000ffffa7ed8000) /lib/ld-linux-aarch64.so.1 (0x0000ffffa8492000) libzstar.so => /usr/local/kzstar/lib/libzstar.so (0x0000ffffa7ea7000) libsecurec.so => /usr/local/kzstar/lib/libsecurec.so (0x0000ffffa7e76000)
- Call the KZstar algorithm library to test the block compression performance. Set the compression level to level 3 and the block size to 128 KB.
./lzbench -ezstd,3 -b128 itemdata
The compression result is as follows:lzbench 1.8 (64-bit Linux) (null) Assembled by P.Skibinski Compressor name Compress. Decompress. Compr. size Ratio Filename memcpy 37578 MB/s 37182 MB/s 7316868 100.00 itemdata zstd 1.5.5 -3 261 MB/s 1030 MB/s 2266113 30.97 itemdata done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=128KB cSpeed=0MB)
In the preceding test results, the compression rate increases from 195 MB/s to 261 MB/s, and the decompression rate increases from 851 MB/s to 1030 MB/s. This test uses only common optimization methods without enabling parallelism. Compared with the open source zstd algorithm, KZstar improves both compression and decompression performance.
Parallel Compression Test
If there are idle CPU resources, you can configure the environment variables ZSTAR_THREAD_NUM_ENV and ZSTAR_THREAD_COMPRESS_LIMIT_ENV to split a compressed package for parallel processing. In this way, the decompression performance is improved without modifying service code. For details about the environment variables, see Table 1.
Environment Variable |
Description |
Value |
|---|---|---|
ZSTAR_THREAD_NUM_ENV |
Specifies the number of available threads. The threads include the initial main thread. That is, the number of sub-threads is the value of ZSTAR_THREAD_NUM_ENV minus 1.
|
The default value is 0 and the maximum value is 17.
|
ZSTAR_THREAD_COMPRESS_LIMIT_ENV |
Specifies the minimum data volume required for enabling parallelism during non-streaming compression. Parallel compression is performed only when the size of the input data for non-streaming compression is greater than or equal to the configured limit. Otherwise, single-thread compression is used. Parallel decompression takes effect only when the package is compressed in parallel. Otherwise, single-thread decompression is used. |
The unit is byte and the default value is 512 KB. You can enter a valid positive integer to adjust the lower limit for parallelism. Other values, such as negative numbers and non-integer numbers, are invalid and the lower limit is used by default. |
This section describes how to perform parallel compression using the KZstar algorithm and compare the performance before and after parallel compression is enabled.
Setting the number of threads to 3 and block size to 128 KB and then to 512 KB
- Call the KZstar algorithm library to test the block compression performance. Set the ZSTAR_THREAD_NUM_ENV environment variable to 3, compression level to level 3, and block size to 128 KB.
ZSTAR_THREAD_NUM_ENV=3 ./lzbench -ezstd,3 -b128 itemdata
The compression result is as follows:lzbench 1.8 (64-bit Linux) (null) Assembled by P.Skibinski Compressor name Compress. Decompress. Compr. size Ratio Filename memcpy 6512 MB/s 37298 MB/s 7316868 100.00 itemdata zstd 1.5.5 -3 261 MB/s 1030 MB/s 2266113 30.97 itemdata done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=128KB cSpeed=0MB)
- Call the KZstar algorithm library to test the block compression performance. Set the ZSTAR_THREAD_NUM_ENV environment variable to 3, compression level to level 3, and block size to 512 KB.
ZSTAR_THREAD_NUM_ENV=3 ./lzbench -ezstd,3 -b512 itemdata
The compression result is as follows:lzbench 1.8 (64-bit Linux) (null) Assembled by P.Skibinski Compressor name Compress. Decompress. Compr. size Ratio Filename memcpy 6447 MB/s 37817 MB/s 7316868 100.00 itemdata zstd 1.5.5 -3 588 MB/s 2084 MB/s 2221068 30.36 itemdata done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=512KB cSpeed=0MB)
According to the preceding compression result, the ZSTAR_THREAD_NUM_ENV=3 configuration enables multi-threading for parallel compression. The 128 KB packet length cannot trigger parallelism, and a single thread is used. The 512 KB packet length triggers parallelism, and the compression and decompression performance is greatly improved.
Setting the number of threads to 3, minimum data volume for enabling parallel compression to 1024 bytes, and block size to 128 KB and then to 512 KB
- Call the KZstar algorithm library to test the block compression performance. Set ZSTAR_THREAD_NUM_ENV to 3, ZSTAR_THREAD_COMPRESS_LIMIT_ENV to 1024, compression level to level 3, and block size to 128 KB.
ZSTAR_THREAD_COMPRESS_LIMIT_ENV=1024 ZSTAR_THREAD_NUM_ENV=3 ./lzbench -ezstd,3 -b128 itemdata
The compression result is as follows:lzbench 1.8 (64-bit Linux) (null) Assembled by P.Skibinski Compressor name Compress. Decompress. Compr. size Ratio Filename memcpy 38627 MB/s 36890 MB/s 7316868 100.00 itemdata zstd 1.5.5 -3 725 MB/s 1785 MB/s 2464904 33.69 itemdata done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=128KB cSpeed=0MB)
- Call the KZstar algorithm library to test the block compression performance. Set ZSTAR_THREAD_NUM_ENV to 3, ZSTAR_THREAD_COMPRESS_LIMIT_ENV to 1024, compression level to level 3, and block size to 512 KB.
ZSTAR_THREAD_COMPRESS_LIMIT_ENV=1024 ZSTAR_THREAD_NUM_ENV=3 ./lzbench -ezstd,3 -b512 itemdata
The compression result is as follows:lzbench 1.8 (64-bit Linux) (null) Assembled by P.Skibinski Compressor name Compress. Decompress. Compr. size Ratio Filename memcpy 38242 MB/s 39856 MB/s 7316868 100.00 itemdata zstd 1.5.5 -3 664 MB/s 2271 MB/s 2232919 30.52 itemdata done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=512KB cSpeed=0MB)
ZSTAR_THREAD_COMPRESS_LIMIT_ENV=1024 indicates that at least 1024 bytes are required for enabling parallelism in non-streaming compression. In this case, parallel compression can be triggered regardless of whether the block size is 128 KB or 512 KB.