Rate This Document
Findability
Accuracy
Completeness
Readability

Encrypting and Decrypting Spark SQL Row-based Data Sources

When using the OmniShield feature to execute Spark services, start Spark to submit tasks.

  1. Deploy the KMS service and create the primary key. For example, you can run the following command to create a primary key in the Hadoop KMS:
    1
    hadoop key create key2
    

    The procedure for starting the Hadoop KMS is the same as that in 1. If the Hadoop KMS has been started, you do not need to start the KMS again when creating a key.

  2. In the /opt/omnishield directory, run the following command to start Spark SQL:
    1
    spark-sql --master local --conf spark.sql.extensions=com.huawei.analytics.shield.sql.DataSourceEncryptPlugin --conf spark.hadoop.io.compression.codecs=com.huawei.analytics.shield.crypto.CryptoCodec --jars omnishield-1.0-SNAPSHOT.jar,kms.jar  --conf spark.executor.extraClassPath=omnishield-1.0-SNAPSHOT.jar:kms.jar --driver-class-path omnishield-1.0-SNAPSHOT.jar:kms.jar
    
  3. In the Spark SQL CLI, run the following SQL statement to create an encrypted data table:
    1
    2
    drop table if exists test1;
    create table test1 (id int) using csv options (`encrypt` 'true',`keyname` 'key2',`kmstype` 'test.HadoopKeyManagementService',`cryptomode` 'aes/gcm/nopadding',`keylength` '128');
    
    • The encrypt parameter indicates whether to use the encrypted table. If the value is true, the encrypted table is used.
    • The keyname parameter specifies the primary key in use. This key must be created in the KMS in advance.
    • The kmstype parameter specifies the KMS type of the primary key. Implement the KMS by yourself based on the APIs provided by OmniShield. The value of this parameter must include the specific class path of the KMS in the kms.jar package.
    • The cryptomode parameter specifies the encryption algorithm, which is AES/GCM/NOPadding or SM4/GCM/NoPadding.
    • The keylength parameter specifies the length of the encrypted key. For AES/GCM/NOPadding, the key length is 128 or 256; for SM4/GCM/NoPadding, the key length is 128.
    • You can create different types of data tables by specifying the data source format. For example, to create a data table in JSON format, replace using json with using csv in 3. The table type is CSV, JSON, or TXT.
  4. Run the following command in the Spark SQL CLI to check whether the data table is encrypted:
    1
    show create table test1;
    

    If the command output contains encryptdatakey in OPTIONS, the table is encrypted.