Rate This Document
Findability
Accuracy
Completeness
Readability

Test Fails When a Large Number of Concurrent Access Requests Are Sent to an RGW

Problem Description

Table 1 Basic information

Item

Information

Source of the Problem

Online maintenance

Product

Kunpeng BoostKit

Sub-item

SDS

Service Scenario

Debugging and running

Component

Other

Output Time

2019-10-28

Author

Chen Xiaobo 00416232

Team

Kunpeng BoostKit

Review Result

Review passed

Review Date

2019-11-05

Release Date

2020-03-20

Keywords

High-concurrency test failed

Symptom

When the number of concurrent access requests of a RADOS Gateway (RGW) is greater than 512, the COSBench test stops unexpectedly.

Key Process and Cause Analysis

The default number of threads of an RGW is 512. When the number of concurrent requests exceeds 512, the RGW cannot process client requests, resulting in the failure of all tests.

Conclusion and Solution

  1. View COSBench logs.
    vim /path/to/cosbench/archive/workload/workload.log
    It consists the following error message:
    HTTP Request Time Out
  2. View RGW logs.
    vim /var/log/ceph/<rgw>.log

    It consists the following error message:

    iterate_obj() failed with -5
  3. Query the default number of threads of the RGW.
    radosgw-admin --show-config | grep thread

    The default number of threads of the RGW (rgw_thread_pool_size) is 512. When the number of concurrent requests exceeds 512, the RGW cannot process client requests, resulting in the failure of all tests.

  4. Run the following command on any Ceph node to increase the number of RGW threads:
    sed -i 's/rgw_frontends.*/& num_threads=1024/g' ceph.conf
  5. Restart the COSBench process.
    systemctl restart ceph-radosgw.target