DataFrame Row-based Data Sources
When using the OmniShield feature to execute Spark services, start Spark to submit tasks.
- Deploy the KMS service and create the primary key. For example, you can run the following command to create a primary key in the Hadoop KMS:
1hadoop key create key1
- In the /opt/omnishield directory, run the following command to encrypt the data source. After the command is executed, the /home/en directory is generated, and the encrypted simple.csv file is stored in this directory.
1spark-submit --class com.huawei.analytics.shield.utils.Encrypt --master local --conf spark.hadoop.io.compression.codecs=com.huawei.analytics.shield.crypto.CryptoCodec --conf spark.shield.primaryKey.name=key1 --conf spark.shield.primaryKey.key1.kms.type=test.example.HadoopKeyManagementService --jars omnishield-1.0-SNAPSHOT.jar,kms.jar omnishield-1.0-SNAPSHOT.jar -i file:///home/simple.csv -o file:///home/en -a AES/GCM/NOPadding -t csv -e encrypt
- In the /opt/omnishield directory, run the following command to decrypt the data source. After the command is executed, the /home/de directory is generated, and the decrypted file is stored in this directory.
1spark-submit --class com.huawei.analytics.shield.utils.Encrypt --master local --conf spark.hadoop.io.compression.codecs=com.huawei.analytics.shield.crypto.CryptoCodec --conf spark.shield.primaryKey.name=key1 --conf spark.shield.primaryKey.key1.kms.type=test.example.HadoopKeyManagementService --jars omnishield-1.0-SNAPSHOT.jar,kms.jar omnishield-1.0-SNAPSHOT.jar -i file:///home/en -o file:///home/de -a AES/GCM/NOPadding -t csv -e decrypt
- The spark.shield.primaryKey parameter specifies the primary key in use. This key must be created in the KMS in advance.
- The spark.shield.primaryKey.key1.kms.type parameter specifies the KMS type of the primary key. Implement the KMS by yourself based on the APIs provided by OmniShield. The value of this parameter must include the specific class path of the KMS in the kms.jar package.
- -i and -o specify the input path and output path respectively, which can point to HDFS or a local file system.
- In encrypt working mode, -i specifies the data source file and -o specifies the location of the encrypted folder to be outputted.
- In decrypt working mode, -i specifies the folder location of the encrypted file and -o specifies the location of the decrypted folder to be outputted.
- -a specifies the encryption algorithm, which must be AES/GCM/NOPadding.
- -t specifies the data source file format, which is CSV, JSON, or TXT.
- -e specifies the working mode, which is encrypt or decrypt.
Parent topic: Using the Feature