Rate This Document
Findability
Accuracy
Completeness
Readability

Encrypting and Decrypting DataFrame Row-based Data Sources

When using the OmniShield feature to execute Spark services, start Spark to submit tasks.

  1. Deploy the KMS service and create the primary key. For example, you can run the following command to start the Hadoop KMS and create a primary key:
    1. Open the Hadoop configuration file.
      1
      2
      cd $HADOOP_HOME/etc/hadoop
      vi core-site.xml
      
    2. Press i to enter the insert mode and add the following content to the <configuration> tag:
      1
      2
      3
      4
      <property>  
        <name>hadoop.security.key.provider.path</name>
        <value>kms://http@x.x.x.x:9600/kms</value>
      </property>
      
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
    4. Copy the core-site.xml file to Spark.
      1
      cp $HADOOP_HOME/etc/hadoop/core-site.xml $SPARK_HOME/conf/
      
    5. Start the Hadoop KMS.
      1
      2
      cd $HADOOP_HOME/sbin
      sh kms.sh start
      
    6. Create a key.
      1
      hadoop key create key1
      
      • When modifying the core-site.xml file, replace x.x.x.x in the second step with the actual IP address.
      • If key1 already exists, run the hadoop key delete key1 command to delete key1 before creating it.
  2. Generate the CSV data source file to be encrypted in the /home directory.
    1. Open the simple.csv file.
      1
      vi simple.csv
      
    2. Press i to enter the insert mode and add the following content to the file:
      1
      2
      3
      4
      5
      6
      name,age,job
      user1,12,Engineer
      user2,13,Engineer
      user3,14,Developer
      user4,15,Engineer
      user5,16,Engineer
      
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
  3. In the /opt/omnishield directory, run the following command to encrypt the data source. After the command is executed, the /home/en directory is generated, and the encrypted simple.csv file is stored in this directory.
    1
    spark-submit --class com.huawei.analytics.shield.utils.Encrypt --master local --conf spark.hadoop.io.compression.codecs=com.huawei.analytics.shield.crypto.CryptoCodec  --conf spark.shield.primaryKey.name=key1 --conf spark.shield.primaryKey.key1.kms.type=test.example.HadoopKeyManagementService --jars omnishield-1.0-SNAPSHOT.jar,kms.jar omnishield-1.0-SNAPSHOT.jar  -i file:///home/simple.csv -o file:///home/en -a AES/GCM/NOPadding -t csv -e encrypt
    
  4. In the /opt/omnishield directory, run the following command to decrypt the data source. After the command is executed, the /home/de directory is generated, and the decrypted file is stored in this directory.
    1
    spark-submit --class com.huawei.analytics.shield.utils.Encrypt --master local --conf spark.hadoop.io.compression.codecs=com.huawei.analytics.shield.crypto.CryptoCodec  --conf spark.shield.primaryKey.name=key1 --conf spark.shield.primaryKey.key1.kms.type=test.example.HadoopKeyManagementService --jars omnishield-1.0-SNAPSHOT.jar,kms.jar omnishield-1.0-SNAPSHOT.jar  -i file:///home/en -o file:///home/de -a AES/GCM/NOPadding -t csv -e decrypt
    
    • The spark.shield.primaryKey parameter specifies the primary key in use. This key must be created in the KMS in advance.
    • The spark.shield.primaryKey.key1.kms.type parameter specifies the KMS type of the primary key. Implement the KMS by yourself based on the APIs provided by OmniShield. The value of this parameter must include the specific class path of the KMS in the kms.jar package.
    • -i and -o specify the input path and output path respectively, which can point to HDFS or a local file system.
      • In encrypt working mode, -i specifies the data source file and -o specifies the location of the encrypted folder to be outputted.
      • In decrypt working mode, -i specifies the folder location of the encrypted file and -o specifies the location of the decrypted folder to be outputted.
    • -a specifies the encryption algorithm, which must be AES/GCM/NOPadding.
    • -t specifies the data source file format, which is CSV, JSON, or TXT.
    • -e specifies the working mode, which is encrypt or decrypt.