Mapping an Object to an OSD

Each storage pool has many PGs. CRUSH dynamically maps them to OSDs. When the Ceph client saves objects, CRUSH maps the objects to a PG.

An indirect layer between OSDs and clients is created when objects are mapped to PGs. The Ceph cluster must be able to increase and reduce the size and dynamically rebalance. If a client knows what objects are stored in each OSD, clients and OSDs are tightly coupled. In contrast, the CRUSH algorithm maps objects to PGs, and then maps each PG to one or more OSDs. This indirect layer enables Ceph to dynamically rebalance when OSDs and underlying devices go online. The following figure shows how CRUSH maps objects to PGs and then maps PGs to OSDs.

Figure 1 CRUSH mapping

With the cluster map copy and CRUSH algorithm, the client can accurately calculate the target OSD for object read and write.

When a Ceph client is bound to a monitor, the client obtains the latest copy of the cluster map. With this copy, the client can know all monitors, OSDs, and metadata servers in the cluster. However, it does not know the object locations.

The object locations are calculated.

The client only needs to enter the object name and storage pool name, which is simple. Ceph stores data in a storage pool such as liverpool. When the client wants to save a named object (such as john, paul, george, or ringo), it uses the object name, hash value, number of PGs in the storage pool, and the storage pool name to calculate the PG. Ceph calculates the PG ID as follows:

The client enters the storage pool name and object name (for example, pool="liverpool" and object-id="john").
CRUSH obtains the object name and hashes it.
CRUSH performs a modulo operation on the hash value by using the number of PGs (for example, 58), and obtains the PG ID.
CRUSH obtains the storage pool ID (for example, liverpool = 4) based on the storage pool name.
CRUSH adds the storage pool ID before the PG ID (for example, 4.58).

Calculating the object location is much faster than querying the location. The CRUSH algorithm allows the client to calculate where an object should be stored and allows the client to connect to the primary OSD to store or retrieve objects.

Parent topic: Dynamic Cluster Management