Function Overview

Supported OSs: CentOS 7.6/openEuler 20.03 LTS

BiSheng JDK Acceleration
- Performance optimization of BiSheng JDK based on the core big data components Hive and Spark.
  
  Constraints
  
  1.Supported OSs
  
  CentOS 7.6 and openEuler 20.03 LTS.
  
  2. Supported components
  
  Hive 2.X/3.X and Spark 2.X.
  
  3.Performance metric
  
  The Hive performance is improved by 2% to 12%, and the Spark performance is improved by 3% to 20%.
  
  In virtualization scenarios, BiSheng JDK delivers better performance than OpenJDK of the corresponding version. The actual performance improvement is subject to the VM specifications.
  
  BiSheng JDK development resources
  
  BiSheng JDK software package
Supported OSs: CentOS 7.6/openEuler 20.03 LTS

Machine Learning Algorithm Libraries
- Spark-based distributed machine learning and graph analysis algorithm libraries.
  
  Constraints
  
  1. Algorithms
  
  classification and regression (random forest, GBDT, SVM, LogisticRegression, LinearRegression, DecisionTree, and XGBoost), clustering (K-means, DBSCAN, and LDA), and feature engineering (PCA, SVD, Pearson, Covariance, Spearman, and IDF), and pattern mining (PrefixSpan and SimRank).
  
  2.Supported OSs
  
  CentOS 7.6 and openEuler 20.03 LTS.
  
  3.Component restrictions
  
  Compatible with Spark 2.3.2, Spark 2.4.5, and Spark 2.4.6. Some of the algorithms support Spark 3.1.1 and Spark 3.3.1, and provide the same interfaces as the native algorithm library. The other algorithms support Spark 2.X and Spark 3.X technically and need to be adapted as required.
  
  4.Hardware
  
  Only Kunpeng servers are supported.
  
  5.Hybrid deployment
  
  Spark clusters where Kunpeng and servers of other architecture are deployed together are supported using computing queues.
  
  6.Performance metric
  
  Compared with Spark's native MLlib and GraphX based on Intel 5318, the machine learning and graph analysis algorithm libraries based on Kunpeng 5220 improve the computing performance by more than 20% while ensuring the same precision.
  
  Machine learning algorithms and graph analysis algorithms can be used in virtualization scenarios. The actual performance improvement is subject to the VM specifications.
  
  Machine Learning Algorithm Library Feature Guide
  
  Machine learning algorithm software package
Supported OSs: CentOS 7.6/openEuler 20.03 LTS

OmniRuntime OmniData
- SQL operator pushdown based on Spark and openLooKeng.
  
  Constraints
  
  1. Supported OSs
  
  CentOS 7.6/openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1.
  
  2. Component
  
  This feature is a great choice for big data scenarios where storage and compute are decoupled or coupled at scale.
  
  Supported components: Spark 3.0.0/3.1.1, Hive 3.1.0 (Tez 0.10.0), and openLooKeng 1.4.0/1.6.1. Other Spark and openLooKeng versions are also supported, but will need to be adapted based on service requirements.
  
  Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  
  3. Performance metric
  
  According to a TPC-H test, the performance of Spark and openLooKeng executing 12 SQL statements is improved by an average of 40% after enabling operator pushdown. According to a TPC-H test, the performance of Hive executing 4 SQL statements is improved by an average of 20% after enabling operator pushdown.
  This feature is suitable for coupled and decoupled storage and compute scenarios that have data locality.
  
  OmniRuntime Feature Guide
Supported OSs: openEuler 20.03 LTS

OmniRuntime OmniOperator
- Acceleration of native operators based on Spark and Hive. This feature can be used together with OmniShuffle.
  
  Constraints
  
  1. Supported OSs
  
  openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1.
  
  2. Supported components
  
  Supported components: Spark 3.1.1, Spark 3.3.1, Spark 3.4.3, Spark 3.5.2, Hive 3.1.0 and Gluten 1.3.
  
  Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  
  3. Performance metric
  
  OmniOperator improves the computing performance of Spark by 30% according to the 99 TPC-DS SQL benchmark test cases.
  This feature optimizes Spark and openLooKeng compute engine kernels. It is suitable for virtualization scenarios.
  
  OmniRuntime Feature Guide
Supported OSs: openEuler 20.03 LTS

OmniRuntime OmniMV
- Spark- and ClickHouse-based intelligent recommendation of materialized views.
  
  Constraints
  
  1. Supported OSs
  
  openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1.
  
  2. Supported components
  
  Spark 3.1.1. Other Spark versions are technically supported and can be adapted based on service requirements.
  
  ClickHouse 22.3.6.5. Other Spark versions are technically supported and can be adapted based on service requirements.
  
  Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  
  3. Performance metric
  
  OmniMV improves the computing performance of Spark by an average of 30% according to the TPC-DS benchmark test cases, and improves the computing performance of ClickHouse by several times according to Star Schema Benchmark test cases.
  Algorithm models involved in OmniCache are applicable to virtualization scenarios.
  
  OmniRuntime Feature Guide
Supported OSs: CentOS 7.6/EulerOS 2.0 SP9 Arm/openEuler 20.03 LTS

OmniRuntime OmniShuffle
- Shuffle process acceleration based on OCK for Spark. This feature can be used together with OmniOperator.
  
  Constraints
  
  1. Supported OSs
  
  CentOS 7.6, EulerOS 2.0 SP9 Arm, and openEuler 20.03 LTS.
  
  2. Supported components
  
  NIC driver: Mellanox 5.1-2.4.1.0
  
  JDK 1.8.0_292
  
  GCC 7.3.0
  
  ZooKeeper 3.7.0
  
  Hadoop 3.1.1
  
  Spark 2.4.6 or later
  
  Python 2.7 or later
  
  HiBench 7.1 (recommended)
  
  Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  
  3. Performance metric
  
  In ESS mode:
  
  Typical configuration: 384 GB memory per node, 3 + 1 servers (three compute nodes and one management node) with two Kunpeng 920 5220 processors per server, at least 10GE network (10GE TCP, 25GE TCP/RDMA, or 100GE TCP/RDMA), and twelve 4 TB SATA drives.
  
  In TeraSort scenario: over 40% higher performance for 1 TB of data
  
  In PageRank (Spark Core) scenario: more than doubled performance for 90 GB of data
  
  In TPC-DS benchmark tests, for 8 TB of data, OmniShuffle improves the Spark performance by 30%, and the combination of OmniShuffle and OmniOperator improves the Spark performance by more than 60%.
  In RSS mode:
  In TPC-DS benchmark tests, for 3 TB of data, the performance is 10% higher than that of Celeborn.
  This feature optimizes Spark and openLooKeng compute engine kernels. It is suitable for virtualization scenarios.
  
  OmniRuntime Feature Guide
Supported OSs:openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1

OmniRuntime OmniAdvisor
- 2.0.0: OmniAdvisor 2.0.0 samples task parameters and recommends optimal configurations through AI iterative tuning, expert rule–based tuning, migration generalization tuning, and operator acceleration, enabling end-to-end parameter tuning for Spark tasks.
  
  1.0.0: Automatic Spark/Hive parameter recommendation using AI.
  
  Constraints
  
  1. Supported OSs
  
  openEuler 20.03 LTS SP1 and openEuler 22.03 LTS SP1.
  
  2.Components:
  
  Spark 3.1.1, Spark 3.3.1 and Hive 3.1.0 (only the Hive on Tez mode). Other Spark and Hive versions are also supported but will need to be adapted based on service requirements.
  
  Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  
  3. Performance metric
  
  2.0.0: Compared with the parameters tuned by experts, OmniAdvisor 2.0 improves performance by approximately 20% on the TPC-DS 3 TB dataset.
  
  1.0.0: improves the Spark performance by 10% according to the 10 TPC-DS SQL benchmark test cases.
  OmniAdvisor applies to VM scenarios.
  
  OmniRuntime Feature Guide
Supported OSs: openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1

OmniRuntime OmniHBaseGSI
- HBase global secondary indexes, improving non-rowkey column query efficiency by multiple times.
  
  Constraints
  
  1. Supported OSs
  
  openEuler 20.03 LTS SP1,openEuler 22.03 LTS SP1.
  
  2. Supported components
  
  HBase 2.4.14. Other Spark and Hive versions are also supported and will need to be adapted based on service requirements.
  
  Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  
  3. Performance metric
  
  OmniHBaseGSI ensure an average latency of less than 30 ms and P99 latency of less than 300 ms in the case of 100 concurrent connections.
  OmniHBaseGSI applies to VM scenarios.
  
  OmniRuntime Feature Guide
Supported OSs: openEuler 22.03 LTS SP4

OmniRuntime OmniShield
- The OmniShield feature runs in the TEE and provides data, network, and drive encryption and decryption as well as application-level remote attestation capabilities for Spark. This feature ensures data security throughout the lifecycle of storage, transmission, and computing.
  
  Constraints
  
  1. Supported OSs
  
  openEuler 22.03 LTS SP4.
  
  2. Supported components
  
  Only 128-bit or 256-bit keys of the AES/GCM/NOPadding algorithm are supported. Only 128-bit keys of the SM4/GCM/NOPadding algorithm are supported.
  
  OmniShield does not provide the KMS service or specify the KMS to be used. Determine what KMS to use by yourself.
  
  Spark 3.3.1. Other Spark versions are technically supported and can be adapted based on service requirements.
  
  3. Performance metric
  
  Based on the 99 TPC-DS benchmark test cases defined by the Big Data Alliance, the performance loss caused by the full-computing link security protection provided by OmniShield does not exceed 20% of the average performance of physical machines.
  OmniShield applies to VM scenarios.
  
  OmniRuntime Feature Guide
Supported OSs: openEuler 22.03 LTS SP3

OmniRuntime OmniScheduler
- The Yarn capacity scheduling policy allocates containers based on the customized weight sorting of physical and logical resources.
  
  Constraints
  
  1. Supported OSs
  
  openEuler 22.03 LTS SP3.
  
  2. Supported components
  
  Spark 3.1.1, Spark 3.4.3, Hive 3.1.0, and Hadoop 3.3.4. Other Spark versions are technically supported and can be adapted based on service requirements.
  
  3. Performance metric
  
  OmniScheduler improves the cluster low-load variance stability by 100% based on TPC-DS benchmark test cases.
  
  OmniScheduler applies to VM scenarios.
  
  OmniRuntime Feature Guide
Supported OSs: openEuler 22.03 LTS SP4

OmniRuntime OmniStream
- Native code (C/C++) is used based on Flink to implement Flink operators, which improve query performance.
  
  Constraints
  
  1. Supported OSs
  
  openEuler 22.03 LTS SP4.
  
  2. Supported components
  
  Flink 1.16.3. Other Flink versions are technically supported and can be adapted based on service requirements.
  
  Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  
  3. Performance metric
  
  OmniStream improves the computing performance of Flink by an average of over 100% according to Nexmark 22-query test cases. In the WordCount scenario, OmniStream delivers 1.31 times the performance of open source Flink DataStream. In the stateless recomputing scenario, OmniStream achieves 1.6 times the performance of open source Flink DataStream.
  OmniStream applies to VM scenarios.
  
  OmniRuntime Feature Guide
Supported OSs: openEuler 22.03 LTS SP3

OmniRuntime OmniStateStore
- Flink backend plugin that accelerates state storage and improves the overall Flink performance.
  
  Constraints
  
  1. Supported OSs
  
  openEuler 22.03 LTS SP3.
  
  2. Supported components
  
  Flink 1.16.1, Flink 1.16.3 and 1.17.1. Other Flink versions are technically supported and can be adapted based on service requirements.
  
  It can run on both Huawei Kunpeng and general-purpose x86 servers.
  
  3. Performance metric
  
  A new state storage technology is introduced to improve the I/O performance of Flink in big data scenarios.
  OmniStateStore does not apply to VM scenarios.
  
  OmniRuntime Feature Guide

Function Overview

Supported OSs: CentOS 7.6/openEuler 20.03 LTS BiSheng JDK Acceleration

Supported OSs: CentOS 7.6/openEuler 20.03 LTS Machine Learning Algorithm Libraries

Supported OSs: CentOS 7.6/openEuler 20.03 LTS OmniRuntime OmniData

Supported OSs: openEuler 20.03 LTS OmniRuntime OmniOperator

Supported OSs: openEuler 20.03 LTS OmniRuntime OmniMV

Supported OSs: CentOS 7.6/EulerOS 2.0 SP9 Arm/openEuler 20.03 LTS OmniRuntime OmniShuffle

Supported OSs:openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1 OmniRuntime OmniAdvisor

Supported OSs: openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1 OmniRuntime OmniHBaseGSI

Supported OSs: openEuler 22.03 LTS SP4 OmniRuntime OmniShield

Supported OSs: openEuler 22.03 LTS SP3 OmniRuntime OmniScheduler

Supported OSs: openEuler 22.03 LTS SP4 OmniRuntime OmniStream

Supported OSs: openEuler 22.03 LTS SP3 OmniRuntime OmniStateStore

Supported OSs: CentOS 7.6/openEuler 20.03 LTS

BiSheng JDK Acceleration

Supported OSs: CentOS 7.6/openEuler 20.03 LTS

Machine Learning Algorithm Libraries

Supported OSs: CentOS 7.6/openEuler 20.03 LTS

OmniRuntime OmniData

Supported OSs: openEuler 20.03 LTS

OmniRuntime OmniOperator

Supported OSs: openEuler 20.03 LTS

OmniRuntime OmniMV

Supported OSs: CentOS 7.6/EulerOS 2.0 SP9 Arm/openEuler 20.03 LTS

OmniRuntime OmniShuffle

Supported OSs:openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1

OmniRuntime OmniAdvisor

Supported OSs: openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1

OmniRuntime OmniHBaseGSI

Supported OSs: openEuler 22.03 LTS SP4

OmniRuntime OmniShield

Supported OSs: openEuler 22.03 LTS SP3

OmniRuntime OmniScheduler

Supported OSs: openEuler 22.03 LTS SP4

OmniRuntime OmniStream

Supported OSs: openEuler 22.03 LTS SP3

OmniRuntime OmniStateStore