Test Fails When a Large Number of Concurrent Access Requests Are Sent to an RGW
Problem Description
Item |
Information |
|---|---|
Source of the Problem |
Online maintenance |
Product |
Kunpeng BoostKit |
Sub-item |
SDS |
Service Scenario |
Debugging and running |
Component |
Other |
Output Time |
2019-10-28 |
Author |
Chen Xiaobo 00416232 |
Team |
Kunpeng BoostKit |
Review Result |
Review passed |
Review Date |
2019-11-05 |
Release Date |
2020-03-20 |
Keywords |
High-concurrency test failed |
Symptom
When the number of concurrent access requests of a RADOS Gateway (RGW) is greater than 512, the COSBench test stops unexpectedly.
Key Process and Cause Analysis
The default number of threads of an RGW is 512. When the number of concurrent requests exceeds 512, the RGW cannot process client requests, resulting in the failure of all tests.
Conclusion and Solution
- View COSBench logs.
vim /path/to/cosbench/archive/workload/workload.log
It consists the following error message:HTTP Request Time Out
- View RGW logs.
vim /var/log/ceph/<rgw>.log
It consists the following error message:
iterate_obj() failed with -5
- Query the default number of threads of the RGW.
radosgw-admin --show-config | grep thread

The default number of threads of the RGW (rgw_thread_pool_size) is 512. When the number of concurrent requests exceeds 512, the RGW cannot process client requests, resulting in the failure of all tests.
- Run the following command on any Ceph node to increase the number of RGW threads:
sed -i 's/rgw_frontends.*/& num_threads=1024/g' ceph.conf
- Restart the COSBench process.
systemctl restart ceph-radosgw.target