Enabling the Smart Write Cache Using Ceph
Drive Partition
In cluster deployment mode, each Ceph node is configured with twelve 4 TB data drives and two 3.2 TB NVMe drives. Each 4 TB data drive functions as the data drive of the bcache device. Each NVMe drive functions as the DB and WAL partitions of six OSDs and the cache drive of the bcache device. Generally, the WAL partition is sufficient if its capacity is greater than 10 GB. According to the official Ceph documents, it is recommended that the size of each DB partition be at least 4% of the capacity of each data drive and that the cache drive capacity account for 5% to 10% of the total data drive capacity. You can configure the size of each DB partition based on the NVMe drive capacity.
In this example, the WAL partition capacity is 15 GB, the DB partition capacity is 30 GB, and the cache drive capacity is 400 GB (10% of the data drive capacity).
Perform the following operations on the three Ceph nodes. The following uses two NVMe drives (/dev/nvme0n1 and /dev/nvme1n1) as an example. If the system has multiple NVMe SSDs, you only need to add the corresponding drive letters to the j parameter. If the required capacity is changed, change the number in the end=`expr $start + 30` command to the required capacity.
- Create a partition.sh script.
vi partition.sh
Add the following content to the file:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
#/bin/bash for j in {0..1} do parted -s /dev/nvme${j}n1 mklabel gpt start=0 # Divide the drive into six 30 GB partitions. end=`expr $start + 30` parted /dev/nvme${j}n1 mkpart primary 2048s ${end}GiB start=$end for i in {1..5} do end=`expr $start + 30` parted /dev/nvme${j}n1 mkpart primary ${start}GiB ${end}GiB start=$end done # Divide the drive into six 15 GB partitions. for i in {1..6} do end=`expr $start + 15` parted /dev/nvme${j}n1 mkpart primary ${start}GiB ${end}GiB start=$end done # Divide the drive into six 400 GB partitions. for i in {1..6} do end=`expr $start + 400` parted /dev/nvme${j}n1 mkpart primary ${start}GiB ${end}GiB start=$end done done
This script applies only to the current hardware configuration. For other hardware configurations, you need to modify the script.
- Run the script.
1bash partition.sh
- Check whether the partitions are successfully created.
1lsblk
If information similar to the following is displayed, the partitions are successfully created:

Creating a Bcache Device
Bcache drives are classified into data drives and cache drives. Generally, HDDs are used as data drives, and SSDs are used as cache drives. Perform the following operations on the three Ceph nodes. In the script, the 12 hard drives in /dev/sda-/dev/sdl are data drives of the bcache device, and the 400 GB partition in the NVMe drive is used as the cache drive. This document uses /dev/nvme0n1p$n as an example. The value of n ranges from 13 to 18, which corresponds to the partition value.
In actual situations, the OS drive may be located in an HDD. For example, if the system drive is installed in /dev/sda, the following script cannot be executed directly. Otherwise, an error is reported when the installation progress reaches make-bcache --wipe-bcache -B /dev/sda. You need to modify the script to ensure that only data drives are operated and other drives such as the OS drive and SSD drive for DB and WAL partitions are not operated.
Check the dive partitions before creating a bcache device.
1 | lsblk |
As shown in the following figure, the sda drive is the system drive.

- Create a create_bcache.sh script.
vi create_bcache.sh
Add the following content to the file:#!/bin/bash n=13 for disk in {a..f} do make-bcache -B /dev/sd${disk} -C /dev/nvme0n1p${n} ((n = $(( $n + 1 )))) done n=13 for disk in {g..l} do make-bcache -B /dev/sd${disk} -C /dev/nvme1n1p$n ((n = $(($n + 1)))) done
The parameters in make-bcache -B /dev/sd${disk} -C /dev/nvme0n1p${n} are defined as follows:
- -B specifies a backend drive (data drive).
- -C specifies a cache device for accelerating the data drive.
For example, the backend drive is sdb and the cache device is nvme0n1p13.
make-bcache -B /dev/sdb -C /dev/nvme0n1p13
- Run the script.
1bash create_bcache.sh
- Check whether the bcache device is successfully created.
lsblk

If the bcache device can be found under the corresponding data drive and cache drive, the creation is successful.
Deploying Ceph
- Install the Ceph software and deploy the MON and MGR nodes.
- Deploy OSD nodes.
Before performing this operation, determine which hard drives are used as data drives and ensure that all partitions on the data drives are cleared. If there are partitions that are not cleared, clear them first.
- Check whether each drive has partitions.
1lsblk
- If a hard drive has partitions, clear the partitions. The following command uses the drive /dev/sdb as an example.
1ceph-volume lvm zap /dev/sdb --destroy
- Create the create_osd.sh script on Ceph-Node 1 and use the 12 bcache drives on each server as OSD data drives.
1 2
cd /etc/ceph vi /etc/ceph/create_osd.shAdd the following content to the script:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
#!/bin/bash for node in ceph1 ceph2 ceph3 do j=7 k=1 for i in `ssh ${node} "ls /sys/block | grep bcache | head -n 6"` do ceph-deploy osd create ${node} --data /dev/${i} --block-wal /dev/nvme0n1p${j} --block-db /dev/nvme0n1p${k} ((j=${j}+1)) ((k=${k}+1)) sleep 3 done j=7 k=1 for i in `ssh ${node} "ls /sys/block | grep bcache | tail -n 6"` do ceph-deploy osd create ${node} --data /dev/${i} --block-wal /dev/nvme1n1p${j} --block-db /dev/nvme1n1p${k} ((j=${j}+1)) ((k=${k}+1)) sleep 3 done done
- This script applies only to the current hardware configuration. For other hardware configurations, you need to modify the script.
- In the ceph-deploy osd create command:
- ${node} specifies the hostname of a node.
- --data specifies a data drive. The backend drive of bcache is used as a data drive.
- --block-db specifies the DB partition.
- --block-wal specifies the WAL partition.
- The DB and WAL partitions are deployed on NVMe SSDs to improve write performance. If no NVMe SSD is configured or NVMe SSDs are used as data drives, you do not need to specify --block-wal. Instead, specify --data.
- Run the script on Ceph 1.
1bash create_osd.sh
- Check whether the OSD nodes are successfully created.
ceph -s

If the status of all the 36 OSD nodes is up, the creation is successful.
- Check whether each drive has partitions.