Function Overview
- ALL
- BiSheng JDK Acceleration
- Machine Learning Algorithm Libraries
- OmniRuntime OmniData
- OmniRuntime OmniOperator
- OmniRuntime OmniMV
- OmniRuntime OmniShuffle
- OmniRuntime OmniAdvisor
- OmniRuntime OmniHBaseGSI
- OmniRuntime OmniShield
- OmniRuntime OmniScheduler
- OmniRuntime OmniStream
- OmniRuntime OmniStateStore
-
Performance optimization of BiSheng JDK based on the core big data components Hive and Spark.
Constraints
1.Supported OSs
CentOS 7.6 and openEuler 20.03 LTS.
2. Supported components
Hive 2.X/3.X and Spark 2.X.
3.Performance metric
The Hive performance is improved by 2% to 12%, and the Spark performance is improved by 3% to 20%.
In virtualization scenarios, BiSheng JDK delivers better performance than OpenJDK of the corresponding version. The actual performance improvement is subject to the VM specifications. -
Spark-based distributed machine learning and graph analysis algorithm libraries.
Constraints
1. Algorithms
classification and regression (random forest, GBDT, SVM, LogisticRegression, LinearRegression, DecisionTree, and XGBoost), clustering (K-means, DBSCAN, and LDA), and feature engineering (PCA, SVD, Pearson, Covariance, Spearman, and IDF), and pattern mining (PrefixSpan and SimRank).
2.Supported OSs
CentOS 7.6 and openEuler 20.03 LTS.
3.Component restrictions Compatible with Spark 2.3.2, Spark 2.4.5, and Spark 2.4.6. Some of the algorithms support Spark 3.1.1 and Spark 3.3.1, and provide the same interfaces as the native algorithm library. The other algorithms support Spark 2.X and Spark 3.X technically and need to be adapted as required. 4.Hardware Only Kunpeng servers are supported. 5.Hybrid deployment Spark clusters where Kunpeng and servers of other architecture are deployed together are supported using computing queues. 6.Performance metric Compared with Spark's native MLlib and GraphX based on Intel 5318, the machine learning and graph analysis algorithm libraries based on Kunpeng 5220 improve the computing performance by more than 20% while ensuring the same precision.Machine learning algorithms and graph analysis algorithms can be used in virtualization scenarios. The actual performance improvement is subject to the VM specifications. -
SQL operator pushdown based on Spark and openLooKeng. Constraints 1. Supported OSs CentOS 7.6/openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1. 2. Component
-
This feature is a great choice for big data scenarios where storage and compute are decoupled or coupled at scale.
Supported components: Spark 3.0.0/3.1.1, Hive 3.1.0 (Tez 0.10.0), and openLooKeng 1.4.0/1.6.1. Other Spark and openLooKeng versions are also supported, but will need to be adapted based on service requirements.
Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
This feature is suitable for coupled and decoupled storage and compute scenarios that have data locality. -
Acceleration of native operators based on Spark and Hive. This feature can be used together with OmniShuffle. Constraints 1. Supported OSs openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1. 2. Supported components
-
Supported components: Spark 3.1.1, Spark 3.3.1, Spark 3.4.3, Spark 3.5.2, Hive 3.1.0 and Gluten 1.3.
- Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
This feature optimizes Spark and openLooKeng compute engine kernels. It is suitable for virtualization scenarios. -
Spark- and ClickHouse-based intelligent recommendation of materialized views. Constraints 1. Supported OSs openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1. 2. Supported components
-
Spark 3.1.1. Other Spark versions are technically supported and can be adapted based on service requirements.
ClickHouse 22.3.6.5. Other Spark versions are technically supported and can be adapted based on service requirements.
Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
Algorithm models involved in OmniCache are applicable to virtualization scenarios. -
Shuffle process acceleration based on OCK for Spark. This feature can be used together with OmniOperator. Constraints 1. Supported OSs CentOS 7.6, EulerOS 2.0 SP9 Arm, and openEuler 20.03 LTS. 2. Supported components
-
NIC driver: Mellanox 5.1-2.4.1.0
JDK 1.8.0_292
GCC 7.3.0
ZooKeeper 3.7.0
Hadoop 3.1.1
Spark 2.4.6 or later
Python 2.7 or later
HiBench 7.1 (recommended)
Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
- Typical configuration: 384 GB memory per node, 3 + 1 servers (three compute nodes and one management node) with two Kunpeng 920 5220 processors per server, at least 10GE network (10GE TCP, 25GE TCP/RDMA, or 100GE TCP/RDMA), and twelve 4 TB SATA drives.
- In TeraSort scenario: over 40% higher performance for 1 TB of data
- In PageRank (Spark Core) scenario: more than doubled performance for 90 GB of data
- In TPC-DS benchmark tests, for 8 TB of data, OmniShuffle improves the Spark performance by 30%, and the combination of OmniShuffle and OmniOperator improves the Spark performance by more than 60%.
In RSS mode:
In TPC-DS benchmark tests, for 3 TB of data, the performance is 10% higher than that of Celeborn.
This feature optimizes Spark and openLooKeng compute engine kernels. It is suitable for virtualization scenarios. -
-
2.0.0: OmniAdvisor 2.0.0 samples task parameters and recommends optimal configurations through AI iterative tuning, expert rule–based tuning, migration generalization tuning, and operator acceleration, enabling end-to-end parameter tuning for Spark tasks.
1.0.0: Automatic Spark/Hive parameter recommendation using AI.
-
Spark 3.1.1, Spark 3.3.1 and Hive 3.1.0 (only the Hive on Tez mode). Other Spark and Hive versions are also supported but will need to be adapted based on service requirements.
Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
-
2.0.0: Compared with the parameters tuned by experts, OmniAdvisor 2.0 improves performance by approximately 20% on the TPC-DS 3 TB dataset.
1.0.0: improves the Spark performance by 10% according to the 10 TPC-DS SQL benchmark test cases.
OmniAdvisor applies to VM scenarios. -
HBase global secondary indexes, improving non-rowkey column query efficiency by multiple times. Constraints 1. Supported OSs openEuler 20.03 LTS SP1,openEuler 22.03 LTS SP1. 2. Supported components
-
HBase 2.4.14. Other Spark and Hive versions are also supported and will need to be adapted based on service requirements.
Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
OmniHBaseGSI applies to VM scenarios. -
The OmniShield feature runs in the TEE and provides data, network, and drive encryption and decryption as well as application-level remote attestation capabilities for Spark. This feature ensures data security throughout the lifecycle of storage, transmission, and computing. Constraints 1. Supported OSs openEuler 22.03 LTS SP4. 2. Supported components
-
Only 128-bit or 256-bit keys of the AES/GCM/NOPadding algorithm are supported. Only 128-bit keys of the SM4/GCM/NOPadding algorithm are supported.
OmniShield does not provide the KMS service or specify the KMS to be used. Determine what KMS to use by yourself.
Spark 3.3.1. Other Spark versions are technically supported and can be adapted based on service requirements.
OmniShield applies to VM scenarios. -
The Yarn capacity scheduling policy allocates containers based on the customized weight sorting of physical and logical resources. Constraints 1. Supported OSs openEuler 22.03 LTS SP3. 2. Supported components Spark 3.1.1, Spark 3.4.3, Hive 3.1.0, and Hadoop 3.3.4. Other Spark versions are technically supported and can be adapted based on service requirements. 3. Performance metric OmniScheduler improves the cluster low-load variance stability by 100% based on TPC-DS benchmark test cases.
OmniScheduler applies to VM scenarios. -
Native code (C/C++) is used based on Flink to implement Flink operators, which improve query performance. Constraints 1. Supported OSs openEuler 22.03 LTS SP4. 2. Supported components
-
Flink 1.16.3. Other Flink versions are technically supported and can be adapted based on service requirements.
Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
OmniStream applies to VM scenarios. -
Flink backend plugin that accelerates state storage and improves the overall Flink performance. Constraints 1. Supported OSs openEuler 22.03 LTS SP3. 2. Supported components
-
Flink 1.16.1, Flink 1.16.3 and 1.17.1. Other Flink versions are technically supported and can be adapted based on service requirements.
It can run on both Huawei Kunpeng and general-purpose x86 servers.
OmniStateStore does not apply to VM scenarios.
Supported OSs: CentOS 7.6/openEuler 20.03 LTS
Supported OSs: CentOS 7.6/openEuler 20.03 LTS
Supported OSs: CentOS 7.6/openEuler 20.03 LTS
Supported OSs: openEuler 20.03 LTS
Supported OSs: openEuler 20.03 LTS
Supported OSs: CentOS 7.6/EulerOS 2.0 SP9 Arm/openEuler 20.03 LTS
Supported OSs:openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1
Supported OSs: openEuler 20.03 LTS SP1/openEuler 22.03 LTS SP1
Supported OSs: openEuler 22.03 LTS SP4
Supported OSs: openEuler 22.03 LTS SP3
Supported OSs: openEuler 22.03 LTS SP4
Supported OSs: openEuler 22.03 LTS SP3