Rate This Document
Findability
Accuracy
Completeness
Readability

Performance Testing

This section uses Esrally to test Elasticsearch performance.

In this section, Esrally tests two datasets: Geonames and Wikipedia.

  • Geonames: a geographical dataset that contains more than 11 million core information records and millions of alias records. It contains fields such as longitude and latitude coordinates, administrative division code, time zone, and population. The information in the dataset is stored in a structured format.
  • Wikipedia: a dataset contains a large amount of text content, such as articles, paragraphs, and lists. It includes both structured data, such as titles, authors, and categories, and semi-structured data, such as tables, lists, and links in articles.

Modifying Wikipedia Track

  1. Modify the cluster health check status.
    1. Go to the corresponding directory and edit the default.json file.
      cd /home/elasticsearch/.rally/benchmarks/tracks/default/wikipedia/operations
      vim default.json
    2. Change the value of wait_for_status to yellow. Elasticsearch is deployed on a single server, and the cluster contains only one node. In this case, the cluster health status is yellow.
      {
      "name": "check-cluster-health",
      "operation-type": "cluster-health",
      "request-params": {
      "wait_for_status": "yellow",
      },
      "retry-until-success": true
      },
  2. Change the number of shards.
    1. Go to the following directory:
      cd /home/elasticsearch/.rally/benchmarks/tracks/default/wikipedia
    2. Edit the wikipedia-full-mapping.json file.
      vim wikipedia-full-mapping.json
    3. Change the default number of shards to 15.
      "number_of_shards": {{number_of_shards | default(15)}},
    4. Edit the wikipedia-minimal-mapping.json file.
      vim wikipedia-minimal-mapping.json
    5. Change the default number of shards to 15.
      "index.number_of_shards": {{number_of_shards | default(15)}}
  3. Run the following commands to submit the changes:
    cd /home/elasticsearch/.rally/benchmarks/tracks/default
    git config --global user.name "Your Name"
    git config --global user.email "your.email@example.com"
    git add wikipedia/operations/default.json
    git add wikipedia/wikipedia-full-mapping.json
    git add wikipedia/wikipedia-minimal-mapping.json
    git commit -m "Modify"

Running Esrally

Online running

Execute the following command to run Esrally with the Geonames track.
numactl -C 0-15 -m 0 esrally race --pipeline=benchmark-only --track=geonames --target-hosts=localhost:9200

This command downloads the dataset online and runs Esrally.

Offline running

  1. Run the following commands as root to create a directory and set its ownership to the elasticsearch user.
    mkdir -p /home/elasticsearch/.rally/benchmarks/tracks/default
    chown -R elasticsearch:elasticsearch /home/elasticsearch/.rally
  2. Download the datasets.
    • You can run the following commands to download the Geonames dataset:
      mkdir -p /home/elasticsearch/.rally/benchmarks/data/geonames
      cd /home/elasticsearch/.rally/benchmarks/data/geonames
      curl -k https://rally-tracks.elastic.co/geonames/documents-2.json.bz2 > documents-2.json.bz2
      curl -k https://rally-tracks.elastic.co/geonames/documents-2-1k.json.bz2 > documents-2-1k.json.bz2
    • You can run the following commands to download the Wikipedia dataset:
      mkdir -p /home/elasticsearch/.rally/benchmarks/data/wikipedia
      cd /home/elasticsearch/.rally/benchmarks/data/wikipedia
      curl -k https://rally-tracks.elastic.co/wikipedia/documents.json.bz2 > documents.json.bz2
      curl -k https://rally-tracks.elastic.co/wikipedia/documents-1k.json.bz2 > documents-1k.json.bz2
  3. Run Esrally under the elasticsearch user (the Geonames track is used as an example).
    numactl -C 0-15 -m 0 esrally race --pipeline=benchmark-only --offline --track=geonames --target-hosts=localhost:9200
    If another Esrally instance is running and prevents a new run, you can run the following command:
    numactl -C 0-15 -m 0 esrally race --pipeline=benchmark-only --offline --track=geonames --target-hosts=localhost:9200 --kill-running-processes

    Start Elasticsearch before running Esrally, and run both under the elasticsearch user (do not run them under the root user).

Running Result Example

Figure 1 shows the run result of the Geonames track.

Figure 1 Run result of the Geonames track

Figure 2 shows the run result of the Wikipedia track

Figure 2 Run result of the Wikipedia track

Key Metrics

Table 1 Key Metrics

Metric

Description

Cumulative indexing time

Time spent indexing all documents. It indicates the efficiency of indexing operations.

Cumulative merge time

Time spent merging segment files. It indicates the efficiency of segment merge operations.

Cumulative refresh time

Time spent refreshing index segments into memory. It affects data visibility latency and is an important metric for evaluating real-time search performance.

Cumulative flush time

Time spent flushing in-memory data to disk. It affects data durability and reliability and is a key metric for assessing system stability.

Cumulative merge throttle time

Time during which segment merges are throttled. It indicates the resource consumption of merge operations and impacts overall system performance.