Rate This Document
Findability
Accuracy
Completeness
Readability

Spark SQL ORC Data Sources

When using the OmniShield feature to execute Spark services, start Spark to submit tasks.

  1. Deploy the Hadoop KMS service and create the primary key.
    1
    hadoop key create key1 -cipher 'SM4/GCM/NoPadding'
    
  2. In the /opt/omnishield directory, run the following command to start Spark SQL:
    1
    spark-sql --master local --conf spark.sql.extensions=com.huawei.analytics.shield.sql.DataSourceEncryptPlugin --conf spark.hadoop.io.compression.codecs=com.huawei.analytics.shield.crypto.CryptoCodec --jars omnishield-1.0-SNAPSHOT.jar,kms.jar  --conf spark.executor.extraClassPath=omnishield-1.0-SNAPSHOT.jar:kms.jar --driver-class-path omnishield-1.0-SNAPSHOT.jar:kms.jar
    
  3. In the Spark SQL CLI, run the following SQL statement to create an encrypted data table:
    1
    2
    drop table if exists otest1;
    create table otest1 (name string) options ( hadoop.security.key.provider.path "kms://http@IP:PORT/kms", orc.key.provider "hadoop", orc.encrypt "key1:name") stored as orc;
    
    • The hadoop.security.key.provider.path parameter specifies the IP address and port of the Hadoop KMS.
    • The orc.encrypt parameter specifies the column to be encrypted.
  4. Run the following command in the Spark SQL CLI to check whether the data table is encrypted:
    1
    describe extended otest1;
    

    If the command output contains orc.encrypt in Storage Properties, the table is encrypted.