Rate This Document
Findability
Accuracy
Completeness
Readability

Introduction

This document describes how to optimize the HDFS EC read/write performance in the RS-6-3-1024k policy on Kunpeng servers powered by Huawei Kunpeng 920 5220 processors.

Introduction to HDFS

The Hadoop Distributed File System (HDFS) works on commodity hardware.

HDFS provides high-throughput data access and is suitable for applications that have large datasets. Hadoop is an open-source distributed storage and compute framework that is widely used for massive data storage and processing. It can process data in a reliable, efficient, and scalable manner.

For more information about Hadoop, see Apache Hadoop 3.1.0.

Introduction to EC

Erasure coding (EC) is a data protection method for storage systems. It encodes original data to obtain redundancy and stores the data and redundancy together to achieve fault tolerance.

N original data pieces are encoded to obtain M redundant data pieces (parity blocks), and then the N+M data pieces are stored across different drives through the DHT algorithm. When any M data pieces or less are faulty (including original data and redundant data), the original N pieces of data can be restored by using a reconstruction algorithm.

For more information, see HDFS Erasure Coding.