Architecture
OmniShield is a confidential computing component of the Spark big data engine. It runs in the TEE of the customer's data center to encrypt and decrypt data by executing the computing process in the hardware-based TEE. With OmniShield, data security in the REE is also safeguarded.
OmniShield can work as a plugin of Spark to encrypt and decrypt CSV, JSON, and TXT row-based data sources in DataFrame and Spark SQL scenarios. You can modify the ORC code to encrypt ORC column-based data sources in the Spark SQL scenario.
Figure 1 shows the OmniShield architecture.
OmniShield performs the following functions:
- Encrypts and decrypts CSV, JSON, and TXT row-based data sources in the DataFrame scenario. The used encryption algorithm is AES/GCM/NOPadding. APIs are provided for mainstream KMSs such as Hadoop to obtain keys.
- Encrypts and decrypts CSV, JSON, and TXT row-based data sources in the Spark SQL scenario. The used encryption algorithm is AES/GCM/NOPadding or SM4/GCM/NOPadding. APIs are provided for mainstream KMSs such as Hadoop to obtain keys.
- Encrypts and decrypts ORC column-based data sources in the Spark SQL scenario. The used encryption algorithm is SM4/GCM/NOPadding.
Parent topic: Feature Overview
