Rate This Document
Findability
Accuracy
Completeness
Readability

Architecture

OmniShield is a confidential computing component for the Spark big data engine. It runs in the TEE of the customer's data center to encrypt and decrypt data by executing the computing process in the hardware-based TEE. With OmniShield, data security in the REE is also safeguarded.

OmniShield can work as a plugin of Spark to encrypt and decrypt CSV, JSON, and TXT row-based data sources in DataFrame and Spark SQL scenarios. You can modify the ORC code to encrypt ORC column-based data sources in the Spark SQL scenario. Also, you can modify the Spark encryption and decryption code to enable the SM4 algorithm for drive and network I/O encryption and decryption. In addition, you can modify the Spark Executor code to implement remote attestation for Spark in Yarn resource management mode.

Figure 1 shows the OmniShield architecture.

Figure 1 Software architecture of OmniShield

OmniShield performs the following functions:

  • Encrypts and decrypts CSV, JSON, and TXT row-based data sources in the DataFrame scenario. The used encryption algorithm is AES/GCM/NOPadding. APIs are provided for mainstream key management servers (KMSs) such as Hadoop to obtain keys.
  • Encrypts and decrypts CSV, JSON, and TXT row-based data sources in the Spark SQL scenario. The used encryption algorithm is AES/GCM/NOPadding or SM4/GCM/NOPadding. APIs are provided for mainstream KMSs such as Hadoop to obtain keys.
  • Encrypts and decrypts ORC column-based data sources in the Spark SQL scenario. The used encryption algorithm is SM4/GCM/NOPadding.
  • Encrypts and decrypts Spark shuffle drive and network I/Os. The used encryption algorithm is SM4/GCM/NOPadding.
  • Provides Spark on Yarn application-level remote attestation based on the Integrity Measurement Architecture (IMA) of the openEuler OS. An IMA-based measurement report can be generated and remote attestation can be initiated when starting the Spark Executor.