Benchmark Performance Test
This section uses Esrally to test Elasticsearch performance.
Esrally is a tool used to benchmark Elasticsearch performance. It can evaluate the performance of Elasticsearch in different environment configurations and loads. The test loads mean the datasets and test scenarios used in the benchmark test. These loads are used to simulate different real-world scenarios for evaluating Elasticsearch performance. In this section, Esrally tests two datasets: Geonames and Wikipedia.
- Geonames: a geographical dataset that contains more than 11 million core information records and millions of alias records. It contains fields such as longitude and latitude coordinates, administrative division code, time zone, and population. The information in the dataset is stored in a structured format.
- Wikipedia: a dataset contains a large amount of text content, such as articles, paragraphs, and lists. It includes both structured data, such as titles, authors, and categories, and semi-structured data, such as tables, lists, and links in articles.
Test Prerequisites
For details about Elasticsearch deployment, see Deploying Elasticsearch.
Installing Esrally
- Install Esrally.
pip3 install esrally
After the installation is successful, run the following command to check the version.esrally --version
- View the available tracks.
esrally list tracks
This step automatically downloads the Esrally configuration file and saves it to /root/.rally.
- Configure rally.ini.
mv /root/.rally /home/elasticsearch/ cd /home/elasticsearch/.rally vim rally.ini
Modify the configuration file as follows: Change the value of datastore.host to the actual server IP address.[meta] config.version = 17 [system] env.name = local [node] root.dir = ${CONFIG_DIR}/benchmarks src.root.dir = ${CONFIG_DIR}/benchmarks/src [source] remote.repo.url = https://github.com/elastic/elasticsearch.git elasticsearch.src.subdir = elasticsearch [benchmarks] local.dataset.cache = ${CONFIG_DIR}/benchmarks/data [reporting] datastore.type = elasticsearch #datastore.host = localhost datastore.host = X.X.X.X datastore.port = 9200 datastore.secure = False datastore.user = datastore.password = [tracks] default.url = https://github.com/elastic/rally-tracks [teams] default.url = https://github.com/elastic/rally-teams [defaults] preserve_benchmark_candidate = false [distributions] release.cache = true
Modifying Wikipedia Track
- Modify the cluster health check status.
- Go to the corresponding directory and edit the default.json file.
cd /home/elasticsearch/.rally/benchmarks/tracks/default/wikipedia/operations vim default.json
- Change the value of wait_for_status to yellow. Elasticsearch is deployed on a single server, and the cluster contains only one node. In this case, the cluster health status is yellow.
{ "name": "check-cluster-health", "operation-type": "cluster-health", "request-params": { "wait_for_status": "yellow", }, "retry-until-success": true },
- Go to the corresponding directory and edit the default.json file.
- Change the number of shards.
- Go to the following directory:
cd /home/elasticsearch/.rally/benchmarks/tracks/default/wikipedia
- Edit the wikipedia-full-mapping.json file.
vim wikipedia-full-mapping.json
- Change the default number of shards to 15.
"number_of_shards": {{number_of_shards | default(15)}}, - Edit the wikipedia-minimal-mapping.json file.
vim wikipedia-minimal-mapping.json
- Change the default number of shards to 15.
"index.number_of_shards": {{number_of_shards | default(15)}}
- Go to the following directory:
- Run the following commands to submit the changes:
cd /home/elasticsearch/.rally/benchmarks/tracks/default git config --global user.name "Your Name" git config --global user.email "your.email@example.com" git add wikipedia/operations/default.json git add wikipedia/wikipedia-full-mapping.json git add wikipedia/wikipedia-minimal-mapping.json git commit -m "Modify"
Running Esrally
Online running
numactl -C 0-15 -m 0 esrally race --pipeline=benchmark-only --track=geonames --target-hosts=localhost:9200
This command downloads the dataset online and runs Esrally.
Offline running
- Run the following commands as root to create a directory and set its ownership to the elasticsearch user.
mkdir -p /home/elasticsearch/.rally/benchmarks/tracks/default chown -R elasticsearch:elasticsearch /home/elasticsearch/.rally
- Download the datasets.
- You can run the following commands to download the Geonames dataset:
mkdir -p /home/elasticsearch/.rally/benchmarks/data/geonames cd /home/elasticsearch/.rally/benchmarks/data/geonames curl -k https://rally-tracks.elastic.co/geonames/documents-2.json.bz2 > documents-2.json.bz2 curl -k https://rally-tracks.elastic.co/geonames/documents-2-1k.json.bz2 > documents-2-1k.json.bz2
- You can run the following commands to download the Wikipedia dataset:
mkdir -p /home/elasticsearch/.rally/benchmarks/data/wikipedia cd /home/elasticsearch/.rally/benchmarks/data/wikipedia curl -k https://rally-tracks.elastic.co/wikipedia/documents.json.bz2 > documents.json.bz2 curl -k https://rally-tracks.elastic.co/wikipedia/documents-1k.json.bz2 > documents-1k.json.bz2
- You can run the following commands to download the Geonames dataset:
- Run Esrally under the elasticsearch user (the Geonames track is used as an example).
numactl -C 0-15 -m 0 esrally race --pipeline=benchmark-only --offline --track=geonames --target-hosts=localhost:9200
If another Esrally instance is running and prevents a new run, you can run the following command:numactl -C 0-15 -m 0 esrally race --pipeline=benchmark-only --offline --track=geonames --target-hosts=localhost:9200 --kill-running-processes
Start Elasticsearch before running Esrally, and run both under the elasticsearch user (do not run them under the root user).
Running Result Example
Figure 1 shows the run result of the Geonames track.
Key Metrics
Metric |
Description |
|---|---|
Cumulative indexing time |
Time spent indexing all documents. It indicates the efficiency of indexing operations. |
Cumulative merge time |
Time spent merging segment files. It indicates the efficiency of segment merge operations. |
Cumulative refresh time |
Time spent flushing index segments into memory. It affects data visibility latency and is an important metric for evaluating real-time search performance. |
Cumulative flush time |
Time spent flushing in-memory data to drives. It affects data durability and reliability and is a key metric for assessing system stability. |
Cumulative merge throttle time |
Time during which segment merges are throttled. It indicates the resource consumption of merge operations and impacts overall system performance. |

