Encrypting and Decrypting Spark SQL ORC Data Sources Using SM Algorithms

When using the OmniShield feature to execute Spark services, start Spark to submit tasks.

Deploy the Hadoop KMS service and create the primary key.
1
hadoop key create key3 -cipher 'SM4/GCM/NoPadding'
The procedure for starting the Hadoop KMS is the same as that in 1. If the Hadoop KMS has been started, you do not need to start the KMS again when creating a key.

In the /opt/omnishield directory, run the following command to start Spark SQL:

spark-sql --master local --conf spark.sql.extensions=com.huawei.analytics.shield.sql.DataSourceEncryptPlugin --conf spark.hadoop.io.compression.codecs=com.huawei.analytics.shield.crypto.CryptoCodec --conf spark.sql.orc.filterPushdown=false --jars omnishield-1.0-SNAPSHOT.jar,kms.jar  --conf spark.executor.extraClassPath=omnishield-1.0-SNAPSHOT.jar:kms.jar --driver-class-path omnishield-1.0-SNAPSHOT.jar:kms.jar

In the Spark SQL CLI, run the following SQL statement to create an encrypted data table:

drop table if exists otest1;
create table otest1 (name string) options ( hadoop.security.key.provider.path "kms://http@IP:PORT/kms", orc.key.provider "hadoop", orc.encrypt "key3:name") stored as orc;

The hadoop.security.key.provider.path parameter specifies the IP address and port of the Hadoop KMS.
The orc.encrypt parameter specifies the column to be encrypted.

Run the following command in the Spark SQL CLI to check whether the data table is encrypted:
1
describe extended otest1;
If the command output contains orc.encrypt in Storage Properties, the table is encrypted.

Parent topic: Using the Feature