基准性能测试

本章节使用Elasticsearch的Benchmark测试工具Esrally，对Elasticsearch性能进行测试。

Esrally是一个用于基准测试Elasticsearch性能的工具。它可以帮助评估Elasticsearch在不同环境配置和负载下的表现。测试负载是指在基准测试中使用的数据集和测试场景。这些负载用于模拟实际使用中的不同情况，以评估Elasticsearch的性能。本文介绍Esrally测试的两个数据集，分别是Geonames和Wikipedia。

Geonames：是一个地理数据集，包含超过1100万个地名核心信息及数百万条别名记录。其数据集涵盖经纬度坐标、行政区划代码、时区、人口等字段。数据集中的信息以结构化格式存储。
Wikipedia：数据集主要包含大量的文本内容，如文章、段落、列表等。既包含结构化数据，如标题、作者、分类等，也包含半结构化的数据，如文章中的表格、列表、链接等半结构化信息。

测试前提

请参见部署Elasticsearch完成Elasticsearch的部署。

安装Esrally

安装Esrally。
```
pip3 install esrally
```
安装成功后，可查看安装的版本。
```
esrally --version
```
查看可运行的tracks。
```
esrally list tracks
```
该步骤可自动下载esrally的配置文件到“/root/.rally”。

配置rally.ini。

mv /root/.rally /home/elasticsearch/
cd /home/elasticsearch/.rally
vim rally.ini

将配置文件修改如下。其中，datastore.host修改为具体使用的服务器IP地址。

[meta]
config.version = 17

[system]
env.name = local

[node]
root.dir = ${CONFIG_DIR}/benchmarks
src.root.dir = ${CONFIG_DIR}/benchmarks/src

[source]
remote.repo.url = https://github.com/elastic/elasticsearch.git
elasticsearch.src.subdir = elasticsearch

[benchmarks]
local.dataset.cache = ${CONFIG_DIR}/benchmarks/data

[reporting]
datastore.type = elasticsearch
#datastore.host = localhost
datastore.host = X.X.X.X
datastore.port = 9200
datastore.secure = False
datastore.user =
datastore.password =


[tracks]
default.url = https://github.com/elastic/rally-tracks

[teams]
default.url = https://github.com/elastic/rally-teams

[defaults]
preserve_benchmark_candidate = false

[distributions]
release.cache = true

修改Wikipedia的track

修改集群健康检查的状态。

进入对应目录编辑default.json文件。

cd /home/elasticsearch/.rally/benchmarks/tracks/default/wikipedia/operations
vim default.json

修改wait_for_status的状态为yellow。由于在单台服务器部署Elasticsearch，集群仅包含一个节点。此时集群健康状态显示为yellow。
```
{
"name": "check-cluster-health",
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "yellow",
},
"retry-until-success": true
},
```

修改分片数量。

进入下述目录。

cd /home/elasticsearch/.rally/benchmarks/tracks/default/wikipedia

编辑wikipedia-full-mapping.json文件。
```
vim wikipedia-full-mapping.json
```

将分片默认数改为15。

"number_of_shards": {{number_of_shards | default(15)}},

编辑wikipedia-minimal-mapping.json文件。
```
vim wikipedia-minimal-mapping.json
```

将分片默认数改为15。

"index.number_of_shards": {{number_of_shards | default(15)}}

提交修改。

cd /home/elasticsearch/.rally/benchmarks/tracks/default
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"
git add wikipedia/operations/default.json
git add wikipedia/wikipedia-full-mapping.json
git add wikipedia/wikipedia-minimal-mapping.json
git commit -m"修改"

运行Esrally

在线运行

以Geonames为例，输入以下命令运行Esrally。

numactl -C 0-15 -m 0 esrally race --pipeline=benchmark-only --track=geonames --target-hosts=localhost:9200

该命令会在线下载数据集并运行Esrally。

离线运行

在root用户下创建目录，并给elasticsearch用户赋权。

mkdir -p /home/elasticsearch/.rally/benchmarks/tracks/default
chown -R elasticsearch:elasticsearch /home/elasticsearch/.rally

下载数据集。

Geonames数据集下载方式如下。

mkdir -p /home/elasticsearch/.rally/benchmarks/data/geonames
cd /home/elasticsearch/.rally/benchmarks/data/geonames
curl -k https://rally-tracks.elastic.co/geonames/documents-2.json.bz2 > documents-2.json.bz2
curl -k https://rally-tracks.elastic.co/geonames/documents-2-1k.json.bz2 > documents-2-1k.json.bz2

Wikipedia数据集下载方式如下。

mkdir -p /home/elasticsearch/.rally/benchmarks/data/wikipedia
cd /home/elasticsearch/.rally/benchmarks/data/wikipedia
curl -k https://rally-tracks.elastic.co/wikipedia/documents.json.bz2 > documents.json.bz2
curl -k https://rally-tracks.elastic.co/wikipedia/documents-1k.json.bz2 > documents-1k.json.bz2

在elasticsearch用户下运行Esrally（以Geonames为例）。
```
numactl -C 0-15 -m 0 esrally race --pipeline=benchmark-only --offline --track=geonames --target-hosts=localhost:9200
```
如果已经有其他的Esrally在运行，导致运行失败，可使用下述命令。
```
numactl -C 0-15 -m 0 esrally race --pipeline=benchmark-only --offline --track=geonames --target-hosts=localhost:9200 --kill-running-processes
```
运行Esrally前需要先启动Elasticsearch，并且Esrally和Elasticsearch都使用elasticsearch用户启动，而不使用root用户启动。

运行结果示例

Geonames运行结果如图1所示。

图1 Geonames运行结果

Wikipedia运行结果如图2所示。

图2 Wikipedia运行结果

关注的指标

表1 关注的指标
指标	说明
Cumulative indexing time	索引所有文档所需的时间，反映索引操作的效率。
Cumulative merge time	合并段文件所需的时间，反映段合并操作的效率。
Cumulative refresh time	刷新索引段到内存所需的时间，影响数据的实时可见性，是评估实时搜索性能的重要指标。
Cumulative flush time	将内存中的数据持久化到磁盘所需的时间，影响数据的持久性和可靠性，是评估系统稳定性的关键指标。
Cumulative merge throttle time	段合并操作被限制的时间，反映段合并操作的资源占用情况，影响系统整体性能。

父主题： Elasticsearch说明