Rate This Document
Findability
Accuracy
Completeness
Readability

Security Hardening

  • Spark security hardening

    During algorithm execution, the standard persist operation provided by Spark is used to cache RDDs to drives as temporary files when the memory space is insufficient. By default, the RDDs cached to drives are not encrypted. If you have higher security requirements, set spark.io.encryption.enabled to true to encrypt the RDDs. Enabling the encryption function will prolong the algorithm execution. In versions earlier than Spark 2.3.3, there is a possibility that data written to drives is not encrypted even if the spark.io.encryption.enabled parameter is set. You are advised to upgrade Spark to a later version.

    Solution:

    • Modify the shell script for submitting tasks and add the following configuration to spark-submit:
      1
      --conf "spark.io.encryption.enabled=true" \
      
    • Upgrade Spark to a later version.
  • Vulnerability fixing

    To ensure the security of the production environment and reduce the risk of attacks, enable the firewall and periodically fix the following vulnerabilities:

    • OS vulnerabilities
    • JDK vulnerabilities
    • Hadoop and Spark vulnerabilities
    • Scala vulnerabilities