Rate This Document
Findability
Accuracy
Completeness
Readability

Benchmark Performance Test

This section uses Esrally to test Elasticsearch performance.

Esrally is a tool used to benchmark Elasticsearch performance. It can evaluate the performance of Elasticsearch in different environment configurations and loads. The test loads mean the datasets and test scenarios used in the benchmark test. These loads are used to simulate different real-world scenarios for evaluating Elasticsearch performance. In this section, Esrally tests two datasets: Geonames and Wikipedia.

  • Geonames: a geographical dataset that contains more than 11 million core information records and millions of alias records. It contains fields such as longitude and latitude coordinates, administrative division code, time zone, and population. The information in the dataset is stored in a structured format.
  • Wikipedia: a dataset contains a large amount of text content, such as articles, paragraphs, and lists. It includes both structured data, such as titles, authors, and categories, and semi-structured data, such as tables, lists, and links in articles.

Test Prerequisites

For details about Elasticsearch deployment, see Deploying Elasticsearch.

Installing Esrally

  1. Install Esrally.
    pip3 install esrally
    After the installation is successful, run the following command to check the version.
    esrally --version
  2. View the available tracks.
    esrally list tracks

    This step automatically downloads the Esrally configuration file and saves it to /root/.rally.

  3. Configure rally.ini.
    mv /root/.rally /home/elasticsearch/
    cd /home/elasticsearch/.rally
    vim rally.ini
    Modify the configuration file as follows: Change the value of datastore.host to the actual server IP address.
    [meta]
    config.version = 17
    
    [system]
    env.name = local
    
    [node]
    root.dir = ${CONFIG_DIR}/benchmarks
    src.root.dir = ${CONFIG_DIR}/benchmarks/src
    
    [source]
    remote.repo.url = https://github.com/elastic/elasticsearch.git
    elasticsearch.src.subdir = elasticsearch
    
    [benchmarks]
    local.dataset.cache = ${CONFIG_DIR}/benchmarks/data
    
    [reporting]
    datastore.type = elasticsearch
    #datastore.host = localhost
    datastore.host = X.X.X.X
    datastore.port = 9200
    datastore.secure = False
    datastore.user =
    datastore.password =
    
    
    [tracks]
    default.url = https://github.com/elastic/rally-tracks
    
    [teams]
    default.url = https://github.com/elastic/rally-teams
    
    [defaults]
    preserve_benchmark_candidate = false
    
    [distributions]
    release.cache = true

Modifying Wikipedia Track

  1. Modify the cluster health check status.
    1. Go to the corresponding directory and edit the default.json file.
      cd /home/elasticsearch/.rally/benchmarks/tracks/default/wikipedia/operations
      vim default.json
    2. Change the value of wait_for_status to yellow. Elasticsearch is deployed on a single server, and the cluster contains only one node. In this case, the cluster health status is yellow.
      {
      "name": "check-cluster-health",
      "operation-type": "cluster-health",
      "request-params": {
      "wait_for_status": "yellow",
      },
      "retry-until-success": true
      },
  2. Change the number of shards.
    1. Go to the following directory:
      cd /home/elasticsearch/.rally/benchmarks/tracks/default/wikipedia
    2. Edit the wikipedia-full-mapping.json file.
      vim wikipedia-full-mapping.json
    3. Change the default number of shards to 15.
      "number_of_shards": {{number_of_shards | default(15)}},
    4. Edit the wikipedia-minimal-mapping.json file.
      vim wikipedia-minimal-mapping.json
    5. Change the default number of shards to 15.
      "index.number_of_shards": {{number_of_shards | default(15)}}
  3. Run the following commands to submit the changes:
    cd /home/elasticsearch/.rally/benchmarks/tracks/default
    git config --global user.name "Your Name"
    git config --global user.email "your.email@example.com"
    git add wikipedia/operations/default.json
    git add wikipedia/wikipedia-full-mapping.json
    git add wikipedia/wikipedia-minimal-mapping.json
    git commit -m "Modify"

Running Esrally

Online running

Execute the following command to run Esrally with the Geonames track.
numactl -C 0-15 -m 0 esrally race --pipeline=benchmark-only --track=geonames --target-hosts=localhost:9200

This command downloads the dataset online and runs Esrally.

Offline running

  1. Run the following commands as root to create a directory and set its ownership to the elasticsearch user.
    mkdir -p /home/elasticsearch/.rally/benchmarks/tracks/default
    chown -R elasticsearch:elasticsearch /home/elasticsearch/.rally
  2. Download the datasets.
    • You can run the following commands to download the Geonames dataset:
      mkdir -p /home/elasticsearch/.rally/benchmarks/data/geonames
      cd /home/elasticsearch/.rally/benchmarks/data/geonames
      curl -k https://rally-tracks.elastic.co/geonames/documents-2.json.bz2 > documents-2.json.bz2
      curl -k https://rally-tracks.elastic.co/geonames/documents-2-1k.json.bz2 > documents-2-1k.json.bz2
    • You can run the following commands to download the Wikipedia dataset:
      mkdir -p /home/elasticsearch/.rally/benchmarks/data/wikipedia
      cd /home/elasticsearch/.rally/benchmarks/data/wikipedia
      curl -k https://rally-tracks.elastic.co/wikipedia/documents.json.bz2 > documents.json.bz2
      curl -k https://rally-tracks.elastic.co/wikipedia/documents-1k.json.bz2 > documents-1k.json.bz2
  3. Run Esrally under the elasticsearch user (the Geonames track is used as an example).
    numactl -C 0-15 -m 0 esrally race --pipeline=benchmark-only --offline --track=geonames --target-hosts=localhost:9200
    If another Esrally instance is running and prevents a new run, you can run the following command:
    numactl -C 0-15 -m 0 esrally race --pipeline=benchmark-only --offline --track=geonames --target-hosts=localhost:9200 --kill-running-processes

    Start Elasticsearch before running Esrally, and run both under the elasticsearch user (do not run them under the root user).

Running Result Example

Figure 1 shows the run result of the Geonames track.

Figure 1 Run result of the Geonames track

Figure 2 shows the run result of the Wikipedia track

Figure 2 Run result of the Wikipedia track

Key Metrics

Table 1 Key metrics

Metric

Description

Cumulative indexing time

Time spent indexing all documents. It indicates the efficiency of indexing operations.

Cumulative merge time

Time spent merging segment files. It indicates the efficiency of segment merge operations.

Cumulative refresh time

Time spent flushing index segments into memory. It affects data visibility latency and is an important metric for evaluating real-time search performance.

Cumulative flush time

Time spent flushing in-memory data to drives. It affects data durability and reliability and is a key metric for assessing system stability.

Cumulative merge throttle time

Time during which segment merges are throttled. It indicates the resource consumption of merge operations and impacts overall system performance.