Deploying OSD Nodes
Creating OSD Partitions
Perform the following operations on the three Ceph nodes. The following uses /dev/nvme0n1 and /dev/nvme1n1 as an example. If the system has multiple NVMe or SATA/SAS SSDs, change /dev/nvme0n1 and /dev/nvme1n1 to the actual drive letters. For non-recommended configurations, if the space of the DB partition and WAL partition of the NVMe drive is insufficient, data will be stored in an HDD, affecting performance.
Ceph 14.2.8 uses BlueStore as the back-end storage engine. The Journal partition in the Jewel version is no longer used. Instead, the DB partition (metadata partition) and WAL partition are used. Respectively, the two partitions store the metadata and log files generated by the BlueStore back end. During cluster deployment, each Ceph node is configured with two 2.9 or 7 TB NVMe drives. Generally, the WAL partition is sufficient if its capacity is greater than 10 GB. According to the official Ceph document, it is recommended that the size of each DB partition be at least 4% of the capacity of each data drive. The size of each DB partition can be flexibly configured based on the NVMe drive capacity. On the current NVMe drive, the following recommended configuration is used. The NVMe drive is divided into twelve 10 GB WAL partitions and twelve 25 GB DB partitions. The remaining NVMe drive capacity is divided into two partitions, which will be used for BDM initialization during the deployment of the Global Cache server.
Data Drive |
DB Partition |
WAL Partition |
BDM Partition |
|---|---|---|---|
2 x 2.9 TB |
12 x 25 GB |
12 x 10 GB |
2 x 2.7 TB |
2 x 7 TB |
12 x 50 GB |
12 x 10 GB |
2 x 6.6 TB |
- Create a partition.sh script.
1vi partition.sh - Add the following content:
#!/bin/bash parted -s /dev/nvme0n1 mklabel gpt parted -s /dev/nvme1n1 mklabel gpt start=4 # Create six 10 GB partitions and six 25 GB partitions. for i in {1..6} do end=`expr $start + 10240` parted /dev/nvme0n1 mkpart primary ${start}MiB ${end}MiB parted /dev/nvme1n1 mkpart primary ${start}MiB ${end}MiB start=$end end=`expr $start + 25600` parted /dev/nvme0n1 mkpart primary ${start}MiB ${end}MiB parted /dev/nvme1n1 mkpart primary ${start}MiB ${end}MiB start=$end done parted /dev/nvme0n1 mkpart primary ${end}MiB 100% parted /dev/nvme1n1 mkpart primary ${end}MiB 100%
This script applies only to the current hardware configuration. For other hardware configurations, you need to modify the script.
- Run the script.
1bash partition.sh
Deploying OSD Nodes
In the following script, the 12 drives /dev/sda to /dev/sdl are data drives, and the OS is installed on /dev/sdm. However, if the data drives are not numbered consecutively, for example, the OS is installed on /dev/sde, you cannot run the script directly. Otherwise, an error will be reported during the deployment on /dev/sde. Instead, you need to modify the script to ensure that only data drives are operated and other drives such as the OS drive and SSD drive for DB and WAL partitions are not operated.
- Check the drive letter of each drive on each node.
1lsblk
As shown in the preceding figure, /dev/sda is the OS drive.
The drives that were ever used as OS drives and data drives in a Ceph cluster may have residual partitions. You can run the lsblk command to check for the drive partitions. For example, if /dev/sdb has partitions, run the following command to clear the partitions:
1ceph-volume lvm zap /dev/sdb --destroy
You must determine the data drives first, and then run the destroy command only when the data drives have residual partitions.
- Create the create_osd.sh script on ceph1 and deploy the OSD on the 12 drives of each server.
1 2
cd /etc/ceph/ vi /etc/ceph/create_osd.sh
Add the following content:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#!/bin/bash for node in ceph1 ceph2 ceph3 do j=1 k=2 for i in {a..f} do ceph-deploy osd create ${node} --data /dev/sd${i} --block-wal /dev/nvme0n1p${j} --block-db /dev/nvme0n1p${k} ((j=${j}+2)) ((k=${k}+2)) sleep 3 done j=1 k=2 for i in {g..l} do ceph-deploy osd create ${node} --data /dev/sd${i} --block-wal /dev/nvme1n1p${j} --block-db /dev/nvme1n1p${k} ((j=${j}+2)) ((k=${k}+2)) sleep 3 done done
- This script applies only to the current hardware configuration. For other hardware configurations, you need to modify the script.
- In the ceph-deploy osd create command:
- ${node} specifies the host name of the node.
- --data specifies the data drive.
- --block-db specifies the DB partition.
- --block-wal specifies the WAL partition.
DB and WAL partitions are usually deployed on NVMe SSDs to improve write performance. If no NVMe SSD is configured or NVMe SSDs are used as data drives, you do not need to specify --block-db or --block-wal. You only need to specify --data.
- Run the script on ceph1.
1bash create_osd.sh - Check whether all the 36 OSD nodes are in the up state.
1ceph -s