Rate This Document
Findability
Accuracy
Completeness
Readability

Feature List

Feature

Feature Description

Constraint

Software Package

Supported on VMs

Remarks

BiSheng JDK Acceleration

Performance optimization of BiSheng JDK based on the core big data components Hive and Spark

  • Supported OSs: CentOS 7.6 and openEuler 20.03 LTS
  • Supported components: Hive 2.X/3.X and Spark 2.X.
  • Performance metric: The Hive performance is improved by 2% to 12%, and the Spark performance is improved by 3% to 20%.

BiSheng binary package:

BiSheng JDK software package

Yes

In VM scenarios, BiSheng JDK delivers better performance than OpenJDK of the corresponding version. The actual performance improvement is subject to the VM specifications.

Machine learning algorithm library

Spark-based distributed machine learning algorithm library

  • Supported OSs: CentOS 7.6 and openEuler 20.03 LTS
  • Component constraints: Compatible with Spark 2.3.2, Spark 2.4.5, and Spark 2.4.6. Some of the algorithms support Spark 3.1.1 and Spark 3.3.1, and provide the same interfaces as the open-source algorithm library. Other Spark 2.X and Spark 3.X versions are technically compatible and can be adapted as needed.
  • Hardware: Only Kunpeng servers are supported.
  • Hybrid deployment: The computing queues in a Spark cluster support a mix of Kunpeng servers and servers based on other chip architectures.
  • Performance metric: Compared with Spark's open-source MLlib based on Intel 5318, the machine learning algorithm library based on Kunpeng 5220 improves the computing performance by more than 20% while ensuring the same precision.

JAR file:

Contact Huawei technical support.

Yes

Machine learning algorithms can be used in VM scenarios. The actual performance improvement is subject to the VM specifications.

OmniOperator

Acceleration of native operators based on Spark and Hive. This feature can be used together with OmniShuffle.

  • Supported OSs: openEuler 20.03 LTS SP1 and openEuler 22.03 LTS SP1
  • Component constraints:
    • Compatible with Spark 3.1.1, Spark 3.3.1, Spark 3.4.3, Spark 3.5.2, Hive 3.1.0, and Gluten 1.3. Other Spark and Hive versions are technically compatible and can be adapted as needed.
    • Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  • Performance metric: OmniOperator improves the performance of Spark by 30% and that of Hive by over 20%, as measured by the 99 TPC-DS benchmark queries.

JAR file:

Contact Huawei technical support.

Yes

This feature optimizes Spark and Hive compute engine kernels. It is suitable for virtualization scenarios.

OmniShuffle

Shuffle process acceleration based on OCK for Spark. This feature can be used together with OmniOperator.

  • Supported OSs:

    CentOS 7.6, EulerOS 2.0 SP9, and openEuler 20.03 LTS

  • Component constraints:
    • NIC driver: Mellanox 5.1-2.4.1.0
    • JDK 1.8.0_292
    • GCC 7.3.0
    • ZooKeeper 3.7.0
    • Hadoop 3.1.1
    • Spark 2.4.6 or later
    • Python 2.7 or later
    • HiBench 7.1 (recommended)
    • Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  • Performance metric:

    In ESS mode:

    • Typical configuration: 384 GB memory per node, 3+1 servers (three compute nodes and one management node) with two Kunpeng 920 5220 processors per server, at least 10GE network (10GE TCP, 25GE TCP/RDMA, or 100GE TCP/RDMA), and twelve 4 TB SATA drives.
    • In TeraSort scenario: over 40% higher performance for 1 TB of data
    • In PageRank (Spark Core) scenario: more than doubled performance for 90 GB of data
    • In TPC-DS benchmark tests, for 8 TB of data, OmniShuffle improves the Spark performance by 30%, and the combination of OmniShuffle and OmniOperator improves the Spark performance by more than 60%.

    In RSS mode:

    In TPC-DS benchmark tests, for 3 TB of data, the performance is 10% higher than that of Celeborn.

JAR file:

Contact Huawei technical support.

Yes

This feature depends on network hardware.

  • The TCP network applies to VM scenarios.
  • The RDMA network depends on whether RDMA virtualization is supported.

OmniAdvisor

1.0: Automatic Spark/Hive parameter recommendation using AI

  • Supported OSs: openEuler 20.03 LTS SP1 and openEuler 22.03 LTS SP1
  • Component constraints:
    • Compatible with Spark 3.1.1, Spark 3.3.1 and Hive 3.1.0 (only the Hive on Tez mode). Other Spark and Hive versions are technically compatible and can be adapted as needed.
    • Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  • Performance metric: OmniAdvisor improves the Spark performance by 10%, as measured by the 10 TPC-DS benchmark SQL queries.

JAR file:

Contact Huawei technical support.

Yes

OmniAdvisor applies to VM scenarios.

2.0: Automatic Spark parameter recommendation using AI-driven iterative tuning, expert rule–based tuning, transfer generalization tuning, and operator acceleration modules

  • Supported OSs: CentOS 7.9 and openEuler 22.03 LTS SP1
  • Component constraints:
    • Compatible with Spark 3.3.1. Other Spark versions are technically compatible and can be adapted as needed.
    • Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  • Performance metric: OmniAdvisor improves the Spark performance by 8% using the TPC-DS 3 TB dataset.

JAR file:

Contact Huawei technical support.

Yes

OmniAdvisor applies to VM scenarios.

OmniMV

Intelligent recommendation of materialized views based on Spark and ClickHouse

  • Supported OSs: openEuler 20.03 LTS SP1 and openEuler 22.03 LTS SP1
  • Component constraints:
    • Compatible with Spark 3.1.1 and Spark 3.4.3. Other Spark versions are technically compatible and can be adapted as needed.
    • ClickHouse 22.3.6.5. Other Spark versions are technically compatible and can be adapted as needed.
    • Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  • Performance metric: OmniMV improves the computing performance of Spark by an average of 30% according to the TPC-DS benchmark test cases, and improves the computing performance of ClickHouse by several times according to Star Schema Benchmark test cases.

JAR file:

Contact Huawei technical support.

Yes

Algorithm models involved in OmniMV are applicable to virtualization scenarios.

OmniScheduler

OmniScheduler leverages the Yarn capacity scheduling policy to allocate containers based on the customized weight sorting of physical and logical resources.

  • Supported OS: openEuler 22.03 LTS SP3
  • Component constraints:

    Compatible with Spark 3.1.1, Spark 3.4.3, Hive 3.1.0, and Hadoop 3.3.4. Other Spark versions are technically compatible and can be adapted as needed.

  • Performance metrics: OmniScheduler improves the cluster low-load variance stability by 100% based on TPC-DS benchmark test cases.

JAR file:

Contact Huawei technical support.

Yes

OmniScheduler applies to VM scenarios.

OmniShield

The OmniShield feature runs in the TEE and provides data, network, and drive encryption and decryption as well as application-level remote attestation capabilities for Spark. This feature ensures data security throughout the lifecycle of storage, transmission, and computing.

  • Supported OS: openEuler 22.03 LTS SP4
  • Component constraints:
    • Only 128-bit or 256-bit keys of the AES/GCM/NOPadding algorithm are supported. Only 128-bit keys of the SM4/GCM/NOPadding algorithm are supported.
    • OmniShield does not provide the KMS service or specify the KMS to be used. Users need to provide their own KMS service and implementation.
    • Compatible with Spark 3.1.1 and Spark 3.3.1. Other Spark versions are technically compatible and can be adapted as needed.
  • Performance metric: As measured by the 99 TPC-DS benchmark queries, the performance loss caused by the full-computing link security protection provided by OmniShield does not exceed 20% of the average performance of physical machines.

JAR file:

Contact Huawei technical support.

Yes

OmniShield applies to VM scenarios.

OmniHBaseGSI

HBase global secondary indexes, improving non-rowkey column query efficiency by multiple times

  • Supported OSs: openEuler 20.03 LTS SP1 and openEuler 22.03 LTS SP1
  • Component constraints:
    • Compatible with HBase 2.4.14. Other Spark and Hive versions are technically compatible and can be adapted as needed.
    • Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  • Performance metric: OmniHBaseGSI ensures an average latency of less than 30 ms and P99 latency of less than 300 ms in the case of 100 concurrent connections.

JAR file:

Contact Huawei technical support.

Yes

OmniHBaseGSI applies to VM scenarios.

OmniData

SQL operator pushdown based on Spark

  • Supported OSs: CentOS 7.6, openEuler 20.03 LTS SP1 and openEuler 22.03 LTS SP1
  • Component constraints:
    • This feature applicable to big data scenarios where storage and compute are decoupled or coupled at scale.
    • Compatible with Spark 3.0.0/3.1.1 and Hive 3.1.0 (Tez 0.10.0). Other Spark versions are technically compatible and can be adapted as needed.
    • Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  • Performance metric: According to a TPC-H test, the performance of Spark executing 12 SQL statements is improved by an average of 40% after enabling operator pushdown. According to a TPC-H test, the performance of Hive executing 4 SQL statements is improved by an average of 20% after enabling operator pushdown.

JAR file:

Contact Huawei technical support.

Yes

This feature is suitable for VM's coupled and decoupled storage and compute scenarios that have data locality.

OmniStream

Native code (C/C++) is used based on Flink to implement Flink operators, which improve query performance.

  • Supported OS: openEuler 22.03 LTS SP4
  • Component constraints:
    • Compatible with Flink 1.16.3. Other Flink versions are technically compatible and can be adapted as needed.
    • Does not support hybrid deployment on Kunpeng servers and servers of other architectures.
  • Performance metrics: OmniStream improves the computing performance of Flink by an average of over 100% according to Nexmark 22-query test cases. In the WordCount scenario, OmniStream delivers 1.31 times the performance of open-source Flink DataStream. In the stateless recomputing scenario, OmniStream achieves 1.6 times the performance of open-source Flink DataStream.

JAR file:

Link

Yes

OmniStream applies to VM scenarios.

OmniStateStore

OmniStateStore acts as the middleware between Flink and RocksDB to reduce the frequency of Flink accessing RocksDB, thereby improving the overall Flink performance.

  • Supported OS: openEuler 22.03 LTS SP3
  • Component constraints:
    • Compatible with Flink 1.16.3. Other Flink versions are technically compatible and can be adapted as needed.
    • Compatible with FRocksDB 6.20.3. Other FRocksDB versions are technically supported and can be adapted based on market requirements.
    • It can run on the Kunpeng computing platform but is not currently supported on general-purpose x86 servers.
  • Performance metrics: In big data scenarios with large volumes of state data, Flink's I/O performance is improved by accelerating access to RocksDB.

JAR file:

OmniStateStore

No

It is closely related to hardware networking and is not applicable to virtualization scenarios.