Spark SQL ORC Data Sources
When using the OmniShield feature to execute Spark services, start Spark to submit tasks.
- Deploy the Hadoop KMS service and create the primary key.
1hadoop key create key1 -cipher 'SM4/GCM/NoPadding'
- In the /opt/omnishield directory, run the following command to start Spark SQL:
1spark-sql --master local --conf spark.sql.extensions=com.huawei.analytics.shield.sql.DataSourceEncryptPlugin --conf spark.hadoop.io.compression.codecs=com.huawei.analytics.shield.crypto.CryptoCodec --jars omnishield-1.0-SNAPSHOT.jar,kms.jar --conf spark.executor.extraClassPath=omnishield-1.0-SNAPSHOT.jar:kms.jar --driver-class-path omnishield-1.0-SNAPSHOT.jar:kms.jar
- In the Spark SQL CLI, run the following SQL statement to create an encrypted data table:
1 2
drop table if exists otest1; create table otest1 (name string) options ( hadoop.security.key.provider.path "kms://http@IP:PORT/kms", orc.key.provider "hadoop", orc.encrypt "key1:name") stored as orc;
- The hadoop.security.key.provider.path parameter specifies the IP address and port of the Hadoop KMS.
- The orc.encrypt parameter specifies the column to be encrypted.
- Run the following command in the Spark SQL CLI to check whether the data table is encrypted:
1describe extended otest1;
If the command output contains orc.encrypt in Storage Properties, the table is encrypted.
Parent topic: Using the Feature