Rate This Document
Findability
Accuracy
Completeness
Readability

Introduction

The Kunpeng Storage Acceleration Library (KSAL) is developed by Huawei. It contains the Erasure Code (EC), Cyclic Redundancy Check 16 T10 Data Integrity Field (CRC16 T10DIF), Cyclic Redundancy Check 32 Castagnoli (CRC32C), memcpy optimization, and Data Analysis Service (DAS) smart prefetch algorithms. This document describes how to install and enable KSAL.

Feature Overview

The development of emerging technologies such as 5G and AI has accelerated the generation and flow of data. According to Huawei's Global Industry Vision (GIV) report, the global data volume will reach 180 ZB by 2025. Diversified services are bringing unprecedented growth of data, and data is becoming more and more important. As the data volume increases, applications have higher requirements on storage system performance. Emerging applications have increasingly high performance requirements. How to improve storage system performance to meet service requirements has become a great challenge.

Algorithms in KSAL are described as follows. This document describes how to obtain, install, deploy, verify, and use KSAL and how to enable the KSAL EC algorithm in Ceph.

  • Based on the Huawei-developed vectorized EC encoding and decoding solution, the EC algorithm replaces the high-order finite field GF(2w) multiplication required in the erasure coding process with binary matrix multiplication through isomorphism mapping, so as to use exclusive or (XOR) instead of complex finite field multiplication that is implemented through table lookup. In addition, the EC algorithm uses an encoding orchestration algorithm to reuse intermediate results in the parity block calculation process, which reduces XOR operands and accelerates encoding by working with Kunpeng vectorized instructions. Compared with mainstream open source EC, the KSAL EC algorithm delivers 2x performance or higher.
  • CRC16 T10DIF and CRC32C use a modulo algorithm for large numbers and Kunpeng vectorized instructions to accelerate encoding. Compared with open source algorithms, the 4 KB verification performance of CRC16 T10DIF is 130% higher, and the 4 KB verification performance of CRC32C is 30% higher.
  • The memcpy algorithm uses CPU prefetch and Kunpeng vectorized instruction acceleration. It improves the 4 KB performance by 30% compared with the built-in memcpy algorithm of glibc.
  • The DAS smart prefetch algorithm analyzes I/O information and prefetches data to the read cache in advance, improving the read performance of 4 KB sequential streams by 100%.