Rate This Document
Findability
Accuracy
Completeness
Readability

Performance Tests

kunpeng-lzbench is a test framework based on lzbench. It calls algorithms such as zstd as dynamic libraries to compare the decompression performance of various compression algorithm libraries.

Compiling the Test Tool

  1. Obtain the lzbench source code from Gitee.
  2. Run the make command to compile and generate the lzbench binary tool.

Block Compression Test

This section describes how to call the open source zstd algorithm library and the KZstar algorithm library to test the block compression performance, and compares the performance metrics before and after KZstar is called.

Calling the open source zstd algorithm library to test the block compression performance

  1. Check the algorithm library used by the test tool.
    ldd lzbench
    If the following information is displayed, the open source zstd algorithm library is used:
     linux-vdso.so.1 (0x0000ffffae181000)
     libz.so.1 => /usr/lib64/libz.so.1 (0x0000ffffae113000)
     libzstd.so.1 => /usr/lib64/libzstd.so.1 (0x0000ffffae012000)
     liblz4.so.1 => /usr/lib64/liblz4.so.1 (0x0000ffffadfe1000)
     libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000ffffaddeb000)
     libm.so.6 => /usr/lib64/libm.so.6 (0x0000ffffadd4a000)
     libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x0000ffffadd19000)
     libc.so.6 => /usr/lib64/libc.so.6 (0x0000ffffadb6a000)
     /lib/ld-linux-aarch64.so.1 (0x0000ffffae144000)
  2. Call the open source zstd algorithm library to test the block compression performance. Set the compression level to level 3 and the block size to 128 KB.
    ./lzbench -ezstd,3 -b128 itemdata
    The compression result is as follows:
    lzbench 1.8 (64-bit Linux)  (null)
    Assembled by P.Skibinski
    
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    memcpy                  37655 MB/s 37041 MB/s     7316868 100.00 itemdata
    zstd 1.5.5 -3             195 MB/s   851 MB/s     2257863  30.86 itemdata

Calling the KZstar algorithm library to test the block compression performance

  1. Set the KZstar environment variable to enable KZstar.
    export LD_LIBRARY_PATH=/usr/local/kzstar/lib:$LD_LIBRARY_PATH
  2. Check the algorithm library used by the test tool.
    ldd lzbench
    If the following information is displayed, the KZstar algorithm library is used:
     linux-vdso.so.1 (0x0000ffffa84cf000)
     libz.so.1 => /usr/lib64/libz.so.1 (0x0000ffffa8461000)
     libzstd.so.1 => /usr/local/kzstar/lib/libzstd.so.1 (0x0000ffffa8380000)
     liblz4.so.1 => /usr/lib64/liblz4.so.1 (0x0000ffffa834f000)
     libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000ffffa8159000)
     libm.so.6 => /usr/lib64/libm.so.6 (0x0000ffffa80b8000)
     libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x0000ffffa8087000)
     libc.so.6 => /usr/lib64/libc.so.6 (0x0000ffffa7ed8000)
     /lib/ld-linux-aarch64.so.1 (0x0000ffffa8492000)
     libzstar.so => /usr/local/kzstar/lib/libzstar.so (0x0000ffffa7ea7000)
     libsecurec.so => /usr/local/kzstar/lib/libsecurec.so (0x0000ffffa7e76000)
  3. Call the KZstar algorithm library to test the block compression performance. Set the compression level to level 3 and the block size to 128 KB.
    ./lzbench -ezstd,3 -b128 itemdata 
    The compression result is as follows:
    lzbench 1.8 (64-bit Linux)  (null)
    Assembled by P.Skibinski
    
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    memcpy                  37578 MB/s 37182 MB/s     7316868 100.00 itemdata
    zstd 1.5.5 -3             261 MB/s  1030 MB/s     2266113  30.97 itemdata
    done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=128KB cSpeed=0MB)

In the preceding test results, the compression rate increases from 195 MB/s to 261 MB/s, and the decompression rate increases from 851 MB/s to 1030 MB/s. This test uses only common optimization methods without enabling parallelism. Compared with the open source zstd algorithm, KZstar improves both compression and decompression performance.

Parallel Compression Test

If there are idle CPU resources, you can configure the environment variables ZSTAR_THREAD_NUM_ENV and ZSTAR_THREAD_COMPRESS_LIMIT_ENV to split a compressed package for parallel processing. In this way, the decompression performance is improved without modifying service code. For details about the environment variables, see Table 1.

Table 1 Environment variables related to parallel compression

Environment Variable

Description

Value

ZSTAR_THREAD_NUM_ENV

Specifies the number of available threads. The threads include the initial main thread. That is, the number of sub-threads is the value of ZSTAR_THREAD_NUM_ENV minus 1.

  • To use non-streaming parallel decompression, the input data must be compressed in parallel, non-streaming mode. The number of threads used during decompression is the smaller value between the number of threads used during compression and that available for decompression.
  • In non-streaming compression or decompression, the main thread participates in the compression or decompression process as well, and the actual number of threads is the same as the configured number. In streaming compression, the main thread does not participate in the compression process, and the actual number of threads is the configured number minus 1.

The default value is 0 and the maximum value is 17.

  • If the number of threads is not set or is set to 0 or 1, multi-threading is disabled.
  • If the number of threads is set to a value greater than 17, 17 will be used.

ZSTAR_THREAD_COMPRESS_LIMIT_ENV

Specifies the minimum data volume required for enabling parallelism during non-streaming compression.

Parallel compression is performed only when the size of the input data for non-streaming compression is greater than or equal to the configured limit. Otherwise, single-thread compression is used.

Parallel decompression takes effect only when the package is compressed in parallel. Otherwise, single-thread decompression is used.

The unit is byte and the default value is 512 KB.

You can enter a valid positive integer to adjust the lower limit for parallelism. Other values, such as negative numbers and non-integer numbers, are invalid and the lower limit is used by default.

This section describes how to perform parallel compression using the KZstar algorithm and compare the performance before and after parallel compression is enabled.

Setting the number of threads to 3 and block size to 128 KB and then to 512 KB

  • Call the KZstar algorithm library to test the block compression performance. Set the ZSTAR_THREAD_NUM_ENV environment variable to 3, compression level to level 3, and block size to 128 KB.
    ZSTAR_THREAD_NUM_ENV=3 ./lzbench -ezstd,3 -b128 itemdata
    The compression result is as follows:
    lzbench 1.8 (64-bit Linux)  (null)
    Assembled by P.Skibinski
    
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    memcpy                   6512 MB/s 37298 MB/s     7316868 100.00 itemdata
    zstd 1.5.5 -3             261 MB/s  1030 MB/s     2266113  30.97 itemdata
    done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=128KB cSpeed=0MB)
  • Call the KZstar algorithm library to test the block compression performance. Set the ZSTAR_THREAD_NUM_ENV environment variable to 3, compression level to level 3, and block size to 512 KB.
    ZSTAR_THREAD_NUM_ENV=3 ./lzbench -ezstd,3 -b512 itemdata
    The compression result is as follows:
    lzbench 1.8 (64-bit Linux)  (null)
    Assembled by P.Skibinski
    
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    memcpy                   6447 MB/s 37817 MB/s     7316868 100.00 itemdata
    zstd 1.5.5 -3             588 MB/s  2084 MB/s     2221068  30.36 itemdata
    done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=512KB cSpeed=0MB)

According to the preceding compression result, the ZSTAR_THREAD_NUM_ENV=3 configuration enables multi-threading for parallel compression. The 128 KB packet length cannot trigger parallelism, and a single thread is used. The 512 KB packet length triggers parallelism, and the compression and decompression performance is greatly improved.

Setting the number of threads to 3, minimum data volume for enabling parallel compression to 1024 bytes, and block size to 128 KB and then to 512 KB

  • Call the KZstar algorithm library to test the block compression performance. Set ZSTAR_THREAD_NUM_ENV to 3, ZSTAR_THREAD_COMPRESS_LIMIT_ENV to 1024, compression level to level 3, and block size to 128 KB.
    ZSTAR_THREAD_COMPRESS_LIMIT_ENV=1024 ZSTAR_THREAD_NUM_ENV=3 ./lzbench -ezstd,3 -b128 itemdata
    The compression result is as follows:
    lzbench 1.8 (64-bit Linux)  (null)
    Assembled by P.Skibinski
    
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    memcpy                  38627 MB/s 36890 MB/s     7316868 100.00 itemdata
    zstd 1.5.5 -3             725 MB/s  1785 MB/s     2464904  33.69 itemdata
    done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=128KB cSpeed=0MB)
  • Call the KZstar algorithm library to test the block compression performance. Set ZSTAR_THREAD_NUM_ENV to 3, ZSTAR_THREAD_COMPRESS_LIMIT_ENV to 1024, compression level to level 3, and block size to 512 KB.
    ZSTAR_THREAD_COMPRESS_LIMIT_ENV=1024 ZSTAR_THREAD_NUM_ENV=3 ./lzbench -ezstd,3 -b512 itemdata
    The compression result is as follows:
    lzbench 1.8 (64-bit Linux)  (null)
    Assembled by P.Skibinski
    
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    memcpy                  38242 MB/s 39856 MB/s     7316868 100.00 itemdata
    zstd 1.5.5 -3             664 MB/s  2271 MB/s     2232919  30.52 itemdata
    done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=512KB cSpeed=0MB)

ZSTAR_THREAD_COMPRESS_LIMIT_ENV=1024 indicates that at least 1024 bytes are required for enabling parallelism in non-streaming compression. In this case, parallel compression can be triggered regardless of whether the block size is 128 KB or 512 KB.