Replication
The Ceph distributed storage uses the replication mechanism to ensure data reliability. Three copies are saved by default (the number of copies can be modified) Ceph uses the CRUSH algorithm to implement fast and accurate data storage in a large-scale cluster. In addition, Ceph can minimize data migration when the hardware is faulty or hardware devices are added. The working principles are as follows:
- When users need to store data in the Ceph cluster, the data is divided into multiple objects. Each object has an object ID, and the object size can be configured. The default object size is 4 MB. An object is the minimum storage unit of the Ceph cluster.
- There are a large number of objects. To effectively reduce object-to-OSD index tables, lower the complexity of metadata, and make read/write more flexible, PGs are introduced. PGs are used to manage objects. Each object is mapped to a PG through a hash algorithm. A PG can contain multiple objects.
- The PGs are mapped to OSDs through CRUSH calculation. If there are three copies, each PG is mapped to three OSDs, ensuring data redundancy.
Figure 1 Resource allocation in the CRUSH algorithm (using two copies)
The CRUSH algorithm is affected by the following factors:
- Current system status (cluster map)
When the OSD status and quantity are changed, the cluster map changes, which affects the mapping between PGs and OSDs.
- Storage policy configuration (data security-related)
The policy can allow three OSDs of the same PG to be located on different servers or racks, improving storage reliability.
Figure 2 shows the multiple data copies. For data block P1 on drive 1 of server 1, its data backup is P1' on drive 2 of server 2. P1 and P1' constitute two copies of the same data block. For example, if drive 1 becomes faulty, P1' can take the place of P1 to provide storage services.
