The Latency Is High When a Large Number of Containers Are Started
Symptom
During the test of concurrently scheduling 100 containers on two Kubernetes worker nodes, the average container startup latency of the Kunpeng platform is much longer than that of the x86 platform. When breaking down the deployment process, it is found that the scheduling duration accounts for nearly 90% of the total latency, which means congestion exists.
Key Process and Cause Analysis
#!/bin/bash
total=100
echo "start delete old pods."
kubectl get jobs | grep "yanzu-test" | awk '{print $1}' | grep -v NAME | xargs kubectl delete job > /dev/null 2>&1
kubectl delete events --all
echo "delete old pods finish"
echo "staring..."
start=$(date +%s)
echo $start
date
declare -i i=1
while ((i<=$total))
do
echo "start" $i
kubectl create -f run.yaml &
let ++i
done
echo "wait for 30s"
sleep 30s
completed=`kubectl get pods | grep "yanzu-test" | awk '{print $3}' | grep -v Completed`
while [ -n "$completed" ]
do
echo "waiting for complete:" $completed
sleep 3
completed=`kubectl get pods | grep "yanzu-test" | awk '{print $3}' | grep -v Completed`
done
echo "finish create jobs"
declare -i sum=0 max=0 sum_sch=0 sum_cre=0 sum_sta=0 max_sch=0 max_cre=0 max_sta=0 sum_pul=0 max_pul=0
pods=`kubectl get pods | grep "yanzu-test" | awk '{print $1}'`
for pod in ${pods[@]}
do
timet=`kubectl describe pod $pod | grep "Started:" | awk '{print $6}'`
p_timet=$(date --date $timet)
# echo "Pod $pod start time is $p_timet"
mils=$(date +%s --date $timet)
used=$((mils-start))
if((used>max))
then
max=used
fi
sch_time=`kubectl get events -o=custom-columns=LastSeen:.metadata.creationTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Scheduled" | awk '{print $1}'`
pul_time=`kubectl get events -o=custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Pulled" | awk '{print $1}'`
cre_time=`kubectl get events -o=custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Created" | awk '{print $1}'`
sta_time=`kubectl get events -o=custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Started" | awk '{print $1}'`
sch_time_t=$(date +%s --date $sch_time)
sch_used=$((sch_time_t-start))
pul_time_t=$(date +%s --date $pul_time)
pul_used=$((pul_time_t-sch_time_t))
cre_time_t=$(date +%s --date $cre_time)
cre_used=$((cre_time_t-pul_time_t))
sta_time_t=$(date +%s --date $sta_time)
sta_used=$((sta_time_t-cre_time_t))
if((sch_used>max_sch))
then
max_sch=sch_used
fi
if((pul_used>max_pul))
then
max_pul=pul_used
fi
if((cre_used>max_cre))
then
max_cre=cre_used
fi
if((sta_used>max_sta))
then
max_sta=sta_used
fi
sum=$((sum+used))
sum_sch=$((sum_sch+sch_used))
sum_pul=$((sum_pul+pul_used))
sum_cre=$((sum_cre+cre_used))
sum_sta=$((sum_sta+sta_used))
done
echo "Avg Scheduled Time: $((sum_sch/100))"
echo "Avg Pulled Time: $((sum_pul/100))"
echo "Avg Created Time: $((sum_cre/100))"
echo "Avg Started Time: $((sum_sta/100))"
echo "Avg Time: $((sum/100))"
echo "Max Scheduled Time: $max_sch"
echo "Max Pulled Time: $max_pul"
echo "Max Created Time: $max_cre"
echo "Max Started Time: $max_sta"
echo "Max Time: $max"
echo "finish test"
apiVersion: batch/v1
kind: Job
metadata:
generateName: yanzu-test-
spec:
completions: 1
parallelism: 1
backoffLimit: 0
activeDeadlineSeconds: 3600
template:
spec:
restartPolicy: Never
containers:
- name: my-container
image: openjdk:8u292-jdk
resources:
requests:
memory: "1Gi"
cpu: "1000m"
limits:
memory: "1Gi"
cpu: "1000m"
command: [ "/bin/bash", "-c", "sleep 21s"]
The procedure is as follows:
- Execute kubectl create -f job.yaml for 100 times in the for loop, and concurrently create 100 jobs to deploy pods to start containers.
- Calculate using the script to find that the scheduling duration ratio is extremely high.
- Container startup duration = Container startup time - Command input time
- Pod scheduling duration = Pod scheduling success time - Command input time
- Run the kubectl logs command to check the controller-manager log. The information about "Throttling request" is found.Figure 1 controller-manager log
In the scheduling phase, Kubernetes concurrently creates 100 jobs objects and pods objects using controller-manager, and then schedules the 100 pods to proper worker nodes using scheduler. In this process, controller-manager and scheduler need to frequently send HTTP requests to apiserver. In these components, the number of API communication requests per second and the peak number of API burst requests per second are controlled by kube-api-qps (50 by default) and kube-api-burst (100 by default), respectively. If the threshold is set to a small value, the API communication requests that exceed the threshold are limited and become throttling requests. As a result, the response becomes slow, the pod creation and scheduling duration are affected, and the latency increases.
Conclusion and Solution
Before initializing the cluster, perform the following steps to configure related parameters:
- Run the kubeadm config command to generate an init-config.yaml configuration file.
kubeadm config print init-defaults > init-config.yaml
- Modify init-config.yaml and add the corresponding startup parameters. In the following example, kubeadm-1.23.1-0.aarch64 is used to describe how to modify the kube-controller-manager startup parameter.
- Open the init-config.yaml file.
vim init-config.yaml
- Modify the following configuration items.
- Run the kubeadm init command to initialize the cluster.
kubeadm init --config=init-config.yaml
- Open the init-config.yaml file.
- After the modification, run the script again to perform the test. The result in Figure 2 shows that the scheduling latency is only about 1 second, which is much shorter than the previous 8 seconds.
