大规模启动容器场景延迟较高
问题现象描述
在进行K8s双工作节点并发调度100个容器的测试时,鲲鹏平均容器启动延迟时间远大于友商x86平均延迟时间,在拆分部署的流程时发现调度环节耗时占整体延迟近90%,存在阻塞。
关键过程、根本原因分析
测试时用到的脚本文件内容如下:
#!/bin/bash
total=100
echo "start delete old pods."
kubectl get jobs | grep "yanzu-test" | awk '{print $1}' | grep -v NAME | xargs kubectl delete job > /dev/null 2>&1
kubectl delete events --all
echo "delete old pods finish"
echo "staring..."
start=$(date +%s)
echo $start
date
declare -i i=1
while ((i<=$total))
do
echo "start" $i
kubectl create -f run.yaml &
let ++i
done
echo "wait for 30s"
sleep 30s
completed=`kubectl get pods | grep "yanzu-test" | awk '{print $3}' | grep -v Completed`
while [ -n "$completed" ]
do
echo "waiting for complete:" $completed
sleep 3
completed=`kubectl get pods | grep "yanzu-test" | awk '{print $3}' | grep -v Completed`
done
echo "finish create jobs"
declare -i sum=0 max=0 sum_sch=0 sum_cre=0 sum_sta=0 max_sch=0 max_cre=0 max_sta=0 sum_pul=0 max_pul=0
pods=`kubectl get pods | grep "yanzu-test" | awk '{print $1}'`
for pod in ${pods[@]}
do
timet=`kubectl describe pod $pod | grep "Started:" | awk '{print $6}'`
p_timet=$(date --date $timet)
# echo "Pod $pod start time is $p_timet"
mils=$(date +%s --date $timet)
used=$((mils-start))
if((used>max))
then
max=used
fi
sch_time=`kubectl get events -o=custom-columns=LastSeen:.metadata.creationTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Scheduled" | awk '{print $1}'`
pul_time=`kubectl get events -o=custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Pulled" | awk '{print $1}'`
cre_time=`kubectl get events -o=custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Created" | awk '{print $1}'`
sta_time=`kubectl get events -o=custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Started" | awk '{print $1}'`
sch_time_t=$(date +%s --date $sch_time)
sch_used=$((sch_time_t-start))
pul_time_t=$(date +%s --date $pul_time)
pul_used=$((pul_time_t-sch_time_t))
cre_time_t=$(date +%s --date $cre_time)
cre_used=$((cre_time_t-pul_time_t))
sta_time_t=$(date +%s --date $sta_time)
sta_used=$((sta_time_t-cre_time_t))
if((sch_used>max_sch))
then
max_sch=sch_used
fi
if((pul_used>max_pul))
then
max_pul=pul_used
fi
if((cre_used>max_cre))
then
max_cre=cre_used
fi
if((sta_used>max_sta))
then
max_sta=sta_used
fi
sum=$((sum+used))
sum_sch=$((sum_sch+sch_used))
sum_pul=$((sum_pul+pul_used))
sum_cre=$((sum_cre+cre_used))
sum_sta=$((sum_sta+sta_used))
done
echo "Avg Scheduled Time: $((sum_sch/100))"
echo "Avg Pulled Time: $((sum_pul/100))"
echo "Avg Created Time: $((sum_cre/100))"
echo "Avg Started Time: $((sum_sta/100))"
echo "Avg Time: $((sum/100))"
echo "Max Scheduled Time: $max_sch"
echo "Max Pulled Time: $max_pul"
echo "Max Created Time: $max_cre"
echo "Max Started Time: $max_sta"
echo "Max Time: $max"
echo "finish test"
对应的run.yaml文件内容如下:
apiVersion: batch/v1
kind: Job
metadata:
generateName: yanzu-test-
spec:
completions: 1
parallelism: 1
backoffLimit: 0
activeDeadlineSeconds: 3600
template:
spec:
restartPolicy: Never
containers:
- name: my-container
image: openjdk:8u292-jdk
resources:
requests:
memory: "1Gi"
cpu: "1000m"
limits:
memory: "1Gi"
cpu: "1000m"
command: [ "/bin/bash", "-c", "sleep 21s"]
具体步骤如下:
- for循环内执行kubectl create -f run.yaml 100次,来并发创建100个jobs部署pods拉起容器。
- 通过脚本计算,发现调度时间占比极高。
- 容器拉起时间=容器启动时间-命令输入时间
- pod调度时间=pod调度成功时间-命令输入时间
- 通过kubectl logs命令查看controller-manager日志,发现Throttling request信息。图1 controller-manager日志
在调度环节,K8s需要通过controller-manager并发创建100个jobs对象和对应pods对象,再通过scheduler将100个pods调度到合适工作节点上。这一过程中,controller-manager和scheduler需要频繁向apiserver发送http请求。而这些组件每秒API通信请求数和每秒API请求突发峰值分别由kube-api-qps(默认50)和kube-api-burst(默认100)控制。当阈值较小时,超过阈值数量的API通信请求会被限流,成为Throttling request,导致响应变慢,进而影响创建pods和调度速度,导致延迟增大。
结论、解决方案及效果
在初始化集群前配置相关参数,具体的配置步骤如下:
- 通过kubeadm config命令生成配置文件init-config.yaml。
kubeadm config print init-defaults > init-config.yaml
- 编辑init-config.yaml文件,添加对应的启动参数,这里以kubeadm-1.23.1-0.aarch64版本下修改kube-controller-manager的启动参数为例进行介绍。
- 打开init-config.yaml文件。
vim init-config.yaml
- 修改配置项如下。
- 通过kubeadm init初始化集群。
kubeadm init --config=init-config.yaml
- 打开init-config.yaml文件。
- 修改后重新运行脚本测试,得到图2中结果,启动的调度延迟在1s左右,比原来的8s左右大大缩小。
父主题: Docker&K8s
