Switching to the UCX Network

Before enabling UCX, you need to configure UCX parameters and modify the configuration files of all nodes. If only SPDK is enabled, skip this section.

Access the Ceph cluster container.
1

cephadm shell

Configure UCX parameters.

       
            ceph config set global ms_type async+ucx
ceph config set global ms_public_type async+ucx
ceph config set global ms_cluster_type async+ucx
ceph config set global ms_async_ucx_device mlx5_bond_0:1,mlx5_bond_1:1
ceph config set global ms_async_ucx_tls rc_verbs
ceph config set global ms_async_ucx_event_polling true

Exit the container and add the following configurations to the MON configuration file on the physical machine of each server node (ceph1 to ceph3):

       
            /var/lib/ceph/[fsid]/mon*/config

ms_type = async+ucx
ms_public_type = async+ucx
ms_cluster_type = async+ucx
ms_async_ucx_device = mlx5_bond_0:1,mlx5_bond_1:1
ms_async_ucx_tls = rc_verbs
ms_async_ucx_event_polling = true

You can run the show_gids command to query device names and enter multiple network devices in ms_async_ucx_device. The selected device(s) must contain the public network IP address and cluster network IP address configured for the node. If this command is not supported, update the NIC firmware and driver. For details, see Updating the NIC Firmware and Driver.
[fsid] indicates the Ceph cluster FSID. You can run the cephadm ls command to query.
Ensure that the RDMA network device names of the server nodes are the same. Otherwise, OSD nodes cannot be started. You can use tools such as /usr/lib/udev/rdma_rename to rename RDMA network devices. The RDMA network device names of the client nodes do not need to be the same.
The IP addresses of the cluster network and public network must be the same as those of the UCX devices (RoCE/IB interfaces).
If ms_async_ucx_event_polling is set to true, event polling is enabled. This reduces latency, improves cluster throughput, and provides higher concurrency. However, the CPU usage increases. In some scenarios where no event is generated, resources are wasted and the system debugging complexity increases. You can toggle this function as required.

Synchronize the MON modification to MGRs and OSDs on all nodes.

       
            ls /var/lib/ceph/*/*/config|grep 'osd\|mgr\|crash'|xargs -I {} cp -r /var/lib/ceph/*/mon.*/config {}

Modify the service file on each node.

       
            sed -i 's/on-failure/always/g' /etc/systemd/system/ceph-*\@.service
sed -i 's/30min/1min/g' /etc/systemd/system/ceph-*\@.service
sed -i '/StartLimitBurst=/c\StartLimitBurst=20' /etc/systemd/system/ceph-*\@.service

Restart the Ceph cluster on each node.

       
            systemctl daemon-reload
systemctl restart ceph.target

After all containers are started, enter the Ceph cluster container again to check the cluster status.
1 2

cephadm shell ceph -s

Parent topic: Deploying Ceph