Rate This Document
Findability
Accuracy
Completeness
Readability

Security Check and Hardening

Routine Antivirus Software Check

Periodically scan clusters and Spark components for viruses. This protects clusters from viruses, malicious code, spyware, and malicious programs, reducing risks such as system breakdown and information leakage. Mainstream antivirus software can be used for antivirus check.

Communication Matrix

For details about the OmniData communication matrix, see Kunpeng BoostKit 23.0.RC5 OmniData Communication Matrix.

Log Control

Note the following:

  • Check whether the system can limit the size of a single log file.
  • Check whether there is a mechanism for clearing logs when the log space is used up.

Checking OmniData Logs

OmniData uses the log framework of log4j. You can modify the logback.xml file in the etc directory to modify the log recording configuration.

  • Check whether the system can limit the size of a single log file.
  • Check whether the system can limit the number of log files or the total size of log files.
  • Check whether there is a mechanism for clearing logs when the log space is used up.

The log configuration is as follows:

<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<!--LOG FILE NAME-->
<FileNamePattern>${LOG_HOME}/omnidata-server.%d{yyyy-MM-dd}.%i.log.gz</FileNamePattern>
<!--LOG FILE RETENTION DAYS -->
<MaxHistory>30</MaxHistory>
<!-- LOG FILE MAX SIZE -->
<MaxFileSize>20MB</MaxFileSize>
<!-- LOG FILE TOTAL SIZE CAP -->
<TotalSizeCap>2GB</TotalSizeCap>
</rollingPolicy>

Checking HAF Logs

The log function of HAF is configured in configuration files. Logs are classified into audit logs and run logs. The configuration file of audit logs is LogAuditCfg.json, and the configuration file of run logs is LogServiceCfg.json.

You can modify the corresponding configuration in the /home/omm/omnidata-install/haf-offload/etc directory.

  • Check whether the system can limit the size of a single log file. logSize specifies the size of a log file, in MB. The value ranges from 1 to 100.
  • Check whether the system can limit the number of log files or the total size of log files. backupCount specifies the number of backup logs. The value ranges from 0 to 100.
  • Check whether there is a mechanism for clearing logs when the log space is used up. You can overwrite backup logs.

The log configuration is as follows:

{
    "autoReload": false,
    "backupCount": 10,
    "logFile": "service.log",
    "logHeaderFormat": "%time%user%level%pid%tname%function%line",
    "logLevel": "INFO",
    "logPath": "/home/omm/haf-install/haf-target/logs/haf_user",
    "logSize": 10485760
}

Buffer Overflow Prevention

To prevent buffer overflow attacks, you are advised to use the address space layout randomization (ASLR) technology to randomize the layout of linear areas such as the heap, stack, and shared library mapping to make it more difficult for attackers to predict target addresses and locate code. This technology can be applied to heaps, stacks, and memory mapping areas (mmap base addresses, shared libraries, and vDSO pages).

How to enable ASLR:

echo 2 >/proc/sys/kernel/randomize_va_space

Vulnerability Fixing

To ensure the security of the production environment and reduce the risk of attacks, enable the firewall and periodically fix the following vulnerabilities:

  • OS vulnerabilities
  • JDK vulnerabilities
  • Hadoop and Spark vulnerabilities
  • ZooKeeper vulnerabilities
  • Kerberos vulnerabilities
  • OpenSSL vulnerabilities
  • Vulnerabilities in other components

    The following uses CVE-2021-37137 as an example.

    Vulnerability description:

    Netty 4.1.17 has two Content-Length HTTP headers that may be confused. The vulnerability ID is CVE-2021-37137.

    The system uses the hdfs-ceph (version 3.2.0) service as the storage object with decoupled storage and compute. This service depends on aws-java-sdk-bundle-1.11.375.jar and involves this vulnerability. You are advised to update the vulnerability patch in a timely manner to prevent hacker attacks.

    Impact:

    Netty 4.1.68 and earlier versions

    Handling suggestion:

    Currently, the vendor has released an upgrade patch to fix the vulnerability. For details, visit the following website:

    https://github.com/netty/netty/security/advisories/GHSA-9vjp-v76f-g363

SSH Hardening

During the deployment and installation of OmniData and HAF, you need to connect to the server through SSH. The root user has all the operation permissions. Logging in to the server as the root user may pose security risks.

You are advised to log in to the server as a common user for installation and deployment and disable root user login using SSH to improve system security. The procedure is as follows:

Check the PermitRootlogin configuration item in /etc/ssh/sshd_config.

  • If the value is no, root user login using SSH is disabled.
  • If the value is yes, change it to no.

Notification of Data Disclosure Risks

The security configurations ock.ucache.rpc.enableAuthentication, ock.ucache.rpc.enableTLS, ock.ucache.rpc.enableAuthorization in the ock.conf file and the security configuration switch of ZooKeeper can be disabled. However, disabling authentication and transmission encryption may cause spoofing and information leakage risks.

Configuring Address Randomization and Kernel Address Stack in Compilation Options

To ensure memory address protection during program running, you are advised to enable address randomization (randomize_va_space, for example, using the echo 2 >\proc\sys\kernel\randomize_va_space command) and kernel address space protection by using kernel address space layout randomization (KASLR), PAX, Supervisor Mode Access Prevention (SMAP), or Supervisor Mode Execution Prevention (SMEP) in compilation options.

Updating Keys

The OmniShuffle service needs to be restarted after keys are updated. Properly plan the key update period.

Use kmc_tool to periodically update keys.

Importing a CRL

After a certification revocation list (CRL) file is generated, configure the CRL file path in the configuration file. The CRL takes effect after the OCKD process is restarted.

Restricting Access from IP Addresses Outside the Cluster

To prevent DoS attacks outside the cluster, you are advised to configure the cluster firewall to restrict the access from IP addresses outside the cluster.

Generally, a big data cluster has multiple NICs, that is, large-bandwidth NICs for the service network (sub-network) and small-bandwidth NICs for the management network. You are advised to bind the OmniShuffle monitoring ports to the service network and configure the service network of each node to receive only packets from the network segments in the cluster on the firewall to defend against DoS attacks from outside the cluster.

This document uses the typical networking of the primary node and compute nodes (secondary 01, secondary 02, and secondary 03) as an example. Each node has two NICs: 10GE NIC A (management network segment 90.90.1.*) and 100GE NIC B (192.168.1.*). In this case, you can do as follows to mitigate DoS attacks from outside the service cluster.

  1. Ensure that the OmniShuffle network communication is implemented through NIC B on each node.

    Set ock.ucache.rpc.transport.devices in the ock.conf file to the device name of NIC B.

  2. Configure the following firewall policy for each node.

    Set iptables or ACL rules to allow the service network segment of the node to receive only packets from the service network segment 192.168.1.* of NIC B.

Configuring a Kerberos Authentication Ticket

Both the OmniShuffle service and ZooKeeper authentication are implemented through Kerberos. To prevent spoofing caused by replay attacks in Kerberos authentication, you are advised to set the validity period of the identity authentication ticket as short as possible.

Recommended Environment Variable Configuration

In Recommended Configuration of UCX Environment Variables, the default values of environment variables such as UCX_TCP_TX_MAX_BUFS, UCX_TCP_RX_MAX_BUFS, UCX_RC_VERBS_TX_MAX_BUFS, UCX_RC_VERBS_RX_MAX_BUFS, UCX_RC_MLX5_TX_MAX_BUFS, and UCX_RC_MLX5_RX_MAX_BUFS are -1, indicating that there is no upper limit on the memory used by the underlying communication library UCX. To prevent service unavailability caused by excessive memory usage, you are advised to set the maximum number to 131072 (that is, when the size of a single buffer is 8 KB, the maximum size of the buffer pool is 1 GB). You can set the environment variables based on the memory configuration and service traffic of your server.