我要评分
获取效率
正确性
完整性
易理解

Architecture

Kunpeng BoostKit for Big Data supports multiple big data platforms and application scenarios such as offline analysis, real-time search, and real-time stream processing.

Real-time search indicates querying a large amount of real-time written data based on primary index keys in real time. The query has high requirements on response time whereas the query conditions are relatively simple. If the query terms are complicated, search for the primary index keys using keywords in all-domain data and then use the primary index keys for query. All-domain data includes structured and text data. Its typical features are as follows:

  • High requirements for millisecond-level query response time
  • High concurrency
  • Up to petabytes of data to be processed
  • Simultaneous processing of structured and unstructured data
  • Full-text search
  • Near-real-time index

Figure 1 shows the system architecture of real-time search.

Figure 1 Big data real-time search architecture
Table 1 Nodes in big data real-time search scenarios

Name

Description

Data source

The data source types include file data (such as TXT and CSV) and stream data (such as Socket flows and OGG log flows).

Data collection system

  • File data is written using batch loading (Flume or other third-party loading tools).
  • Streaming data is written using real-time loading (Spark Streaming or other third-party collection tools).

Real-time search engine

  • HBase: used for primary key retrieval (key-value retrieval). The search criteria are simple.
  • Elasticsearch: used for full-text search or used as a non-primary key index stored in HBase. It can store both data and indexes. However, it is not cost-effective and applies only to small-scale sites.
  • The real-time search engine (Elasticsearch+HBase) is suitable for quick search (query based on specified criteria). It is not suitable for statistical query (such as GROUP and SUM statements) and complex query (such as JOIN and IN statements and sub-query).

Service applications

Real-time search applications developed by ISVs based on Elasticsearch, HBase APIs, and RESTful APIs.