Architecture
Kunpeng BoostKit for Big Data supports multiple big data platforms and application scenarios such as offline analysis, real-time search, and real-time stream processing.
Real-time stream processing generally refers to real-time rapid data analysis to trigger next-step actions. Real-time data analysis has high requirements on the processing speed. In addition, due to the large amount of data, the requirements on CPU and memory are high. In comparison, not much storage capacity is required because the data does not need to be stored in most cases. Real-time processing is generally implemented through Storm, Spark Streaming, or Flink tasks. Its typical features are as follows:
- High requirement on processing time (millisecond level)
- Massive data to be processed (hundreds of megabytes per second)
- A large number of compute resources occupied
- Prone to compute resource preemption
- Data mainly in network protocol formats
- Relatively simple tasks
- Isolation of data from clients, small storage capacity
The distributed message system Kafka sends collected data to distributed stream computing engines (Flink, Storm, and Spark Streaming) in real time for processing. Redis stores the results and provides caches for upper-layer services. Figure 1 shows the detailed system architecture.
|
Name |
Description |
|---|---|
|
Data source |
Include real-time stream data (such as Socket streams, OGG log streams, and log files), real-time files, and databases. |
|
Real-time data collection system |
|
|
Message middleware |
The message middleware caches real-time data and supports high-throughput message subscription and release. Kafka: distributed message system. It supports message production and release, and message caching in various forms, meeting the requirements of efficient and reliable message production and consumption. |
|
Distributed stream computing engine |
Quickly analyzes real-time data.
|
|
Data cache |
Caches stream processing analysis results to meet the access requirements of stream processing applications. Redis: supports high-speed key-value storage and query capabilities for rapidly caching stream processing results. |
|
Service applications |
Service applications developed by ISVs for querying and using real-time stream processing results. |
