Erasure Coding

Erasure coding ( EC ) is a fault-tolerant coding technology. It was first used in the telecommunication industry to resolve the data loss problem during transmission. This mechanism segments the signals to be transmitted, add signal parity, and then associate the segments. Even if some signals are lost during transmission, the receive end can still calculate the complete information through an algorithm. In data storage, the erasure coding mechanism divides data into segments, extend and encode redundant data blocks, and store them in different locations such as drives, storage nodes, or other physical locations.

The erasure code is a structure of k data blocks and m check blocks. The values of k and m can be set according to a specific rule. The formula is n = k + m. The variable k indicates the number of original data blocks. The variable m indicates the number of redundant data blocks for failure protection. The variable n indicates the total number of data blocks after the erasure coding mechanism is implemented. When less than or equal to m storage blocks (data blocks or parity blocks) are damaged, all data blocks can be obtained by calculating data in the remaining storage blocks, preventing data loss.

The following uses k=3 and m=2 as an example to describe how to store an object named NYAN in Ceph in erasure coding mode. Assume that content of the object is ABCDEFGHI. After NYAN is uploaded to Ceph, the client calls the erasure coding algorithm in the primary OSD to encode the data. The original data ABCDEFGHI is split into three segments, and then another two parity segments (the contents are YXY and QGC) are calculated. According to the rule specified by the CRUSH map, the five segments are randomly distributed on five different OSDs to complete the object storage, as shown in Figure 1.

Figure 1 Storing an object named NYAN in Ceph in EC mode

The following describes how to read data by using erasure coding. NYAN is used as an example. After the client sends a request for reading NYAN, the primary OSD (OSD1 in this example) of the PG where the object is located sends the request to other associated OSDs. If OSD4 is faulty and cannot respond to the request, only segments of OSD1 (GHI), OSD3 (YXY), and OSD5 (ABC) can be obtained. Although OSD2 receives the request and sends data, its data is received last. In this case, OSD1, as the primary OSD, decodes the data segments of OSD1, OSD3, and OSD5, and ignores the data segment of OSD2. The NYAN data (ABCDEFGHI) is restored and returned to the client.

Parent topic: Common Features