Elasticsearch Overview

Elasticsearch is an open-source real-time distributed search and analysis engine. It uses Lucene for indexing and search. This section describes the Elasticsearch component from the aspects of basic concepts and process analysis.

Cluster Structure

The Elasticsearch cluster solution consists of the EsMaster, EsClient, EsNode1, EsNode2, and EsNode3 processes. Table 1 describes the modules.

**Table 1** Modules of the Elasticsearch cluster solution
Module	Description
Client	Client communicates with the EsMaster and EsNode instance processes in the Elasticsearch cluster over HTTP or HTTPS to perform distributed collection and search.
EsMaster	EsMaster is the master node of Elasticsearch. It manages the cluster, such as determining shard allocation and tracing cluster nodes.
EsNode1-3	EsNode1–3 are data nodes of Elasticsearch. They store index data and add, delete, modify, query, and aggregate documents.
EsClient	EsClient is the coordinator node of Elasticsearch. It only processes routing requests, searches for data, and dispatching indexes. EsClient does not store data or manage clusters.
ZooKeeper cluster	ZooKeeper provides heartbeat mechanism for processes in the Elasticsearch cluster.

Basic Concepts

Index: An index is a logical namespace in Elasticsearch, consisting of one or multiple shards. Apache Lucene is used to read and write data in the index. It is similar to a relational database (RDB) instance. One Elasticsearch instance can contain multiple indexes.
Type: If documents of various structures are stored in an index, you can find the parameter mapping information according to the document type, facilitating document storage. A type is similar to a table in a database. One index corresponds to one document type.
Document: A document is a basic unit of information that can be indexed. This document refers to JSON data at the top-level structure or obtained by serializing the root object. It is similar to a row in a database. A type contains multiple documents.
Mapping: A mapping is used to restrict the type of a field and can be automatically created based on data. It is similar to a schema in a database.
Field: The field is the minimum unit of a document. It is similar to a column in a database. Each document contains multiple fields.
EsMaster: The master node that temporarily manages some cluster-level changes, such as creating or deleting indexes, and adding or removing nodes. The master node does not participate in document-level changes or searches, nor does it receive requests. When traffic increases, the master node does not become the bottleneck of the cluster.
EsNode: Elasticsearch node. A node is an Elasticsearch instance.
EsClient: an Elasticsearch node. It processes routing requests, searches for data, and dispatches indexes. It does not store data or manage a cluster.
Shard: The shard is the smallest work unit in Elasticsearch. The document is stored and referenced in the shard.
Primary shard: Each document in the index belongs to a primary shard. The number of primary shards determines the maximum data that can be stored in the index.
Replica shard: A replica shard is a copy of the primary shard. It prevents data loss caused by hardware faults and provides read requests, such as searching for or retrieving documents from other shards.

General Working Principles

Internal architecture
Elasticsearch provides various access interfaces through RESTful APIs or other languages (such as Java), uses the cluster discovery mechanism, and supports script languages and various plug-ins. The underlying layer is based on Lucene, with absolute independence of Lucene and stores indexes through local files, shared files, and HDFS.

Inverted indexing
Elasticsearch (Lucene) uses the inverted indexing mode, which is different from the forward indexing mode of traditional relational databases. A table consisting of different keywords is called a dictionary, which contains various keywords and statistics of the keywords (including the ID of the document where a keyword is located, the location of the keyword in the document, and the frequency of the keyword). In this search mode, Elasticsearch searches for the document ID and location based on a keyword and then finds the document, which is similar to the method of looking for a word in a dictionary or finding the content on a specific book page according to the table of contents of the book. Inverted indexing is time consuming for constructing indexes and costly for maintenance, but it is efficient in search.

Parent topic: Introduction