Architecture

Kunpeng BoostKit for Big Data supports multiple big data platforms and application scenarios such as offline analysis, real-time search, and real-time stream processing.

Real-time search indicates querying a large amount of real-time written data based on primary index keys in real time. The query has high requirements on response time whereas the query conditions are relatively simple. If the query terms are complicated, search for the primary index keys using keywords in all-domain data and then use the primary index keys for query. All-domain data includes structured and text data. Its typical features are as follows:

High requirements for millisecond-level query response time
High concurrency
Up to petabytes of data to be processed
Simultaneous processing of structured and unstructured data
Full-text search
Near-real-time index

Figure 1 shows the system architecture of real-time search.

Figure 1 Big data real-time search architecture

**Table 1** Nodes in big data real-time search scenarios
Name	Description
Data source	The data source types include file data (such as TXT and CSV) and stream data (such as Socket flows and OGG log flows).
Data collection system	File data is written using batch loading (Flume or other third-party loading tools). Streaming data is written using real-time loading (Spark Streaming or other third-party collection tools).
Real-time search engine	HBase: used for primary key retrieval (key-value retrieval). The search criteria are simple. Elasticsearch: used for full-text search or used as a non-primary key index stored in HBase. It can store both data and indexes. However, it is not cost-effective and applies only to small-scale sites. The real-time search engine (Elasticsearch+HBase) is suitable for quick search (query based on specified criteria). It is not suitable for statistical query (such as GROUP and SUM statements) and complex query (such as JOIN and IN statements and sub-query).
Service applications	Real-time search applications developed by ISVs based on Elasticsearch, HBase APIs, and RESTful APIs.

Parent topic: Real-Time Retrieval