Container Running Configuration Rules

Enabling cgroups to Control Resource Usage

To prevent denial of service (DoS) attacks caused by exhaustion of system resources, you can use specific CLI parameters to enable resource restrictions.

The base_box.sh script of Kbox uses the Docker startup parameters --cpuset-cpus and --memory to limit the CPU and memory usage. Kbox uses the Ext4 file system and does not support the --storage-opt parameter. Therefore, Kbox does not restrict the storage usage.

Restricting Linux Kernel Capabilities in Containers

By default, Docker uses limited Linux kernel capabilities to start containers. That is, each process is granted necessary capabilities instead of the root access. With Linux kernel capabilities, processes do not need to run with the root access in almost all specific areas where root privileges are required. Users who use a non-default configuration file can use Docker to add or remove capabilities. Docker is more secure by fewer capabilities and less secure with more capabilities.

The base_box.sh script of Kbox uses the following method to disable all capabilities by default. Then you can add necessary capabilities as required. For details about the values of {"Capability 1", ..., "Capability n"}, query the base_box.sh script.
1
docker run --cap-drop=ALL --cap-add={"Capability 1", ..., "Capability n"}
Run the following command to check whether only necessary capabilities are added:
1
docker ps --quiet --all |xargs docker inspect --format '{{ .Id }}:CapAdd={{ .HostConfig.CapAdd }}CapDrop={{ .HostConfig.CapDrop }}'
When the Android container is running, the SYS_ADMIN capability needs to be enabled, and the mount capability needs to be enabled in apparmor. After the container has the mount capability, it can escape by mounting file systems such as cgroup, proc, and sysfs. You are advised to filter sensitive file systems or perform security hardening on the host OS and kernel in apparmor.

Preventing Containers from Acquiring Additional Privileges Through SUID or GUID Bits

Ensure that containers do not gain any additional privileges through SUID or GUID bits. SUID and GUID programs can run in the context of the process file owner or group if they are attacked and cause arbitrary code execution (such as buffer overflow).

You are advised to perform the following steps to delete unnecessary SUID and GUID programs:

Find unnecessary SUID and GUID programs.

find / -perm -4000 -exec ls -l {} \; 2>/dev/null
find / -perm -2000 -exec ls -l {} \; 2>/dev/null

Remove SUID and GUID file permissions. Replace filename and directory with actual ones.
1 2
sudo chmod u-s filename sudo chmod -R g-s directory

Prevention of Mounting Sensitive Host System Directories on Containers

Such directories (including the root directory, /boot, /dev, /etc, /lib, /proc, /sys, and /usr) cannot be mounted to container volumes, especially when users have read and write permissions on these directories. Otherwise, sensitive files in these directories may be changed. These changes may not be necessary and may cause security issues, posing risks to the Docker host.

In the base_box.sh script of Kbox, change the mount mode of subdirectories in the /proc, /sys, and /dev directories to read-only.

Run the following command. If the command output does not match Source:(/|/boot|/dev|/etc|/lib|/proc|/sys|/usr)\s+.*RW:true, the change is successful.

docker ps --quiet --all| xargs docker inspect --format '{{ .Id }}: Volumes={{ .Mounts }}' 2>/dev/null

Running Only Required Software in Containers and Not Running Untrusted Applications as the Root User

If unnecessary software runs on a container, the container is more susceptible to attacks. It conflicts with the principle of minimum and compact container image. Therefore, do not install or run unnecessary software in containers.

Enabling Only the Ports Required by Containers

A container can only use the ports that the Dockerfile defines for the container image or can send runtime parameters to enable the port list. The Dockerfile may change over time. The exposed port list may or may not be related to the applications running in the container. Enabling unnecessary ports exposes the containers and container-based applications to more attacks. Therefore, do not expose unnecessary ports.

Generally you need to perform the following operations:

When starting a container, use the --publish or -p option to specify the port required by a specific container. Example:

docker run --interactive --tty --publish 5000 --publish 5001 --publish 5002 my_container /bin/bash

The Kbox base_box.sh script uses the following method to specify the port required by a container:

local PORT
for PORT in ${PORTS[@]}; do
    RUN_OPTION+=" -p $PORT "
done

Setting a Proper CPU Priority for Each Container

By default, the CPU time is evenly allocated among containers. However, if required, the CPU share feature can be used to control the CPU time allocation among containers. The CPU share feature sorts containers by priority and forbids containers with lower priority to frequently occupy CPU resources, ensuring that containers with higher priority are better served.

The base_box.sh script of Kbox uses the startup parameter --cpu-shares to manage the CPU quota between containers.

Binding the Imported Container Traffic to Specific Host Interfaces

Docker containers can connect to the outside world without further configuration, but the outside world cannot connect to Docker containers by default. The IP address of the host is displayed each time when a connection is established. Only specific external interfaces on the host have access to the Docker service. If there are many network interfaces on the host, Docker may be accessible to the port connections exposed on any network interface. This may not be the expected result or may not be protected. Under some circumstances, certain interfaces are exposed, and services, such as intrusion detection, intrusion defense, firewall, and load balancing, can run on the interfaces to filter public traffic that can be imported. Therefore, only import connections from specific external interfaces are accepted. Here, host interfaces are network interfaces used for communication between the host and external systems, and host ports indicate the TCP/UDP ports corresponding to the host interfaces. An interface often contains an IP address and a port.

Generally you need to perform the following operations:

Bind the container ports to the specific host interface that the host ports require. In the following command, container port 80 is bound to host port 49153 and accepts only input links from external interfaces of 10.2.3.4.

docker run --detach --publish 10.2.3.4:49153:80 nginx

To implement security hardening, you can modify the -p option in the Kbox base_box script. For details about the configuration format, see the following example. Configure an available IP address based on your requirements.

RUN_OPTION+=" -p XXX.XXX.XXX.XXX:$PORT "

Mounting the Root File System of Containers in Read-only Mode

The root file system of containers is regarded as a "golden image", which cannot be written. Instead, container volumes need to be specified for write operations.

If the root file system of containers is mounted in read-only mode, compatibility issues may occur. Determine whether to do so based on your site requirements.

Enabling the Default seccomp Profile File

The default seccomp profile is used to specify the trustlist of system calls that can be executed. The processes can use seccomp to filter incoming system calls and specify a filter. The default seccomp profile of Docker works on the basis of trustlists, allowing 311 system calls and blocking all other calls. A large number of system calls are exposed to each userland process. Many of these system calls are not used during the process running. Most applications do not require these system calls, so you can use compact system calls. Compact system calls reduce the total kernel surface exposed to the applications, thereby improving security.

In the solution for openEuler, the seccomp profile needs to be disabled when the Android container is running (seccomp=unconfined). You need to evaluate whether the seccomp profile can be disabled in your application scenario and the risks of disabling the profile. You are advised to disable the seccomp profile and then perform security hardening on the host OS and kernel.

Limiting the Number of File Handles and Fork Processes

An attacker can use a single command to start a fork bomb in containers. The fork bomb can crash the whole system. You need to restart the host to make the crashed system run again. In addition, attackers can open a large number of file handles to deplete available file handle resources, causing denial of service (DoS) attacks. Set thresholds for fork processes and file handles according to the actual service demand of the product.

To implement this security hardening measure, you can modify the Kbox base_box.sh script. For details, see the following example. You need to set the thresholds based on the application scenario. If the thresholds are inappropriate, containers may be unavailable.

RUN_OPTION+=" --pids-limit XXX --files-limit XXX "

Prevention of Using the Default Bridge Interface docker0

Use the user-defined Docker container network instead of the default Docker bridge interface docker0. docker0 is vulnerable to ARP spoofing and MAC flooding attacks before network filtering is configured. Note: Docker runs in docker0 by default.

Generally you need to perform the following operations:

Delete the existing bridge. Before performing this operation, delete all containers from the server.
1 2 3
sudo systemctl stop docker sudo ip link set dev docker0 down sudo brctl delbr docker0
If the message "Command 'brctl' not found" is displayed, install bridge-utils.
1
yum install bridge-utils -y
Create a bridge named bridge0. The following example is for reference only. Configure the available network segment based on your site requirements to ensure that the container can connect to the network properly.
1 2 3
sudo brctl addbr bridge0 sudo ip addr add XXX.XXX.XXX.XXX/XX dev bridge0 sudo ip link set dev bridge0 up
For details about the available network segments, see the configuration of the default bridge interface (docker0). Run the following command to query the docker0 configuration:
1
ip addr show docker0
As shown in the following figure, for example, the network segment is 172.17.0.1/16.

You can run the following command to check the details about bridge0.
1
ip addr show bridge0

Add the configuration information of the new bridge.

Open the /etc/docker/daemon.json file.
1
vim /etc/docker/daemon.json

Press i to enter the insert mode and add the "bridge": "bridge0" property to the file. The format is as follows:

{
"debug": true,
"data-root":"/root/sda/docker",
"ipv6": true,
"fixed-cidr-v6": "2001:db8::/64",
"bridge": "bridge0"
}

Press Esc, type :wq!, and press Enter to save the file and exit.

If the /etc/docker/daemon.json file does not exist, create it. Add "bridge":"bridge0" in JSON format.

touch /etc/docker/daemon.json
cat >/etc/docker/daemon.json <<EOF
{
"debug":true,
"data-root":"/root/sda/docker",
"ipv6":true,
"fixed-cidr-v6":"2001:db8::/64",
"bridge":"bridge0"
}
EOF

Restart the Docker service.
Before restarting the Docker service, ensure that no other container is running. If any other container is running in the environment, clear it.
1 2
systemctl daemon-reload systemctl restart docker
- Each time after the server is restarted, perform 2 to create bridge0 and then restart the Docker service.
- After the Docker service is restarted, the default bridge interface docker0 may be restarted. Therefore, after performing 4, run the following command to check whether Docker0 exists in the current environment:
  1
  ip addr show docker0
  
  If docker0 is still displayed, run the following commands to manually delete it:
  1 2
  sudo ip link set dev docker0 down sudo brctl delbr docker0

Parent topic: Security Rules for Using Docker Containers