大规模拉起容器场景延迟较高
问题现象描述
在进行k8s双工作节点并发调度100个容器的测试时,鲲鹏平均容器拉起延迟时间远大于友商x86平均延迟时间,在拆分部署的流程时发现调度环节耗时占整体延迟近90%,存在阻塞。
关键过程、根本原因分析
测试时用到的脚本文件内容如下:
#!/bin/bash total=100 echo "start delete old pods." kubectl get jobs | grep "yanzu-test" | awk '{print $1}' | grep -v NAME | xargs kubectl delete job > /dev/null 2>&1 kubectl delete events --all echo "delete old pods finish" echo "staring..." start=$(date +%s) echo $start date declare -i i=1 while ((i<=$total)) do echo "start" $i kubectl create -f run.yaml & let ++i done echo "wait for 30s" sleep 30s completed=`kubectl get pods | grep "yanzu-test" | awk '{print $3}' | grep -v Completed` while [ -n "$completed" ] do echo "waiting for complete:" $completed sleep 3 completed=`kubectl get pods | grep "yanzu-test" | awk '{print $3}' | grep -v Completed` done echo "finish create jobs" declare -i sum=0 max=0 sum_sch=0 sum_cre=0 sum_sta=0 max_sch=0 max_cre=0 max_sta=0 sum_pul=0 max_pul=0 pods=`kubectl get pods | grep "yanzu-test" | awk '{print $1}'` for pod in ${pods[@]} do timet=`kubectl describe pod $pod | grep "Started:" | awk '{print $6}'` p_timet=$(date --date $timet) # echo "Pod $pod start time is $p_timet" mils=$(date +%s --date $timet) used=$((mils-start)) if((used>max)) then max=used fi sch_time=`kubectl get events -o=custom-columns=LastSeen:.metadata.creationTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Scheduled" | awk '{print $1}'` pul_time=`kubectl get events -o=custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Pulled" | awk '{print $1}'` cre_time=`kubectl get events -o=custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Created" | awk '{print $1}'` sta_time=`kubectl get events -o=custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Started" | awk '{print $1}'` sch_time_t=$(date +%s --date $sch_time) sch_used=$((sch_time_t-start)) pul_time_t=$(date +%s --date $pul_time) pul_used=$((pul_time_t-sch_time_t)) cre_time_t=$(date +%s --date $cre_time) cre_used=$((cre_time_t-pul_time_t)) sta_time_t=$(date +%s --date $sta_time) sta_used=$((sta_time_t-cre_time_t)) if((sch_used>max_sch)) then max_sch=sch_used fi if((pul_used>max_pul)) then max_pul=pul_used fi if((cre_used>max_cre)) then max_cre=cre_used fi if((sta_used>max_sta)) then max_sta=sta_used fi sum=$((sum+used)) sum_sch=$((sum_sch+sch_used)) sum_pul=$((sum_pul+pul_used)) sum_cre=$((sum_cre+cre_used)) sum_sta=$((sum_sta+sta_used)) done echo "Avg Scheduled Time: $((sum_sch/100))" echo "Avg Pulled Time: $((sum_pul/100))" echo "Avg Created Time: $((sum_cre/100))" echo "Avg Started Time: $((sum_sta/100))" echo "Avg Time: $((sum/100))" echo "Max Scheduled Time: $max_sch" echo "Max Pulled Time: $max_pul" echo "Max Created Time: $max_cre" echo "Max Started Time: $max_sta" echo "Max Time: $max" echo "finish test"
对应的run.yaml文件内容如下:
apiVersion: batch/v1 kind: Job metadata: generateName: yanzu-test- spec: completions: 1 parallelism: 1 backoffLimit: 0 activeDeadlineSeconds: 3600 template: spec: restartPolicy: Never containers: - name: my-container image: openjdk:8u292-jdk resources: requests: memory: "1Gi" cpu: "1000m" limits: memory: "1Gi" cpu: "1000m" command: [ "/bin/bash", "-c", "sleep 21s"]
具体步骤如下:
- for循环内执行kubectl create -f job.yaml 100次,来并发创建100个jobs部署pods拉起容器。
- 通过脚本计算,发现调度时间占比极高。
- 容器拉起时间=容器启动时间-命令输入时间
- pod调度时间=pod调度成功时间-命令输入时间
- 通过kubectl logs命令查看controller-manager日志,发现Throttling request信息。图1 controller-manager日志
在调度环节,k8s需要通过controller-manger并发创建100个jobs对象和对应pods对象,再通过scheduler将100个pods调度到合适工作节点上。这一过程中,controller-manager和scheduler需要频繁向apiserver发送http请求。而这些组件每秒api通信请求数和每秒api请求突发峰值分别由kube-api-qps(默认50)和kube-api-burst(默认100)控制。当阈值较小时,超过阈值数量的api通信请求会被限流,成为Throttling request,导致响应变慢,进而影响创建pods和调度速度,导致延迟增大。
结论、解决方案及效果
在初始化集群前配置相关参数,具体的配置步骤如下:
- 通过kubeadm config命令生成配置文件init-config.yaml。
kubeadm config print init-defaults > init-config.yaml
- 编辑init-config.yaml文件,添加对应的启动参数,这里以kubeadm-1.23.1-0.aarch64版本下修改kube-controller-manager的启动参数为例进行介绍。
- 打开init-config.yaml文件。
vim init-config.yaml
- 修改配置项如下。
- 通过kubeadm init初始化集群。
kubeadm init --config=init-config.yaml
- 打开init-config.yaml文件。
- 修改后重新运行脚本测试,得到图2中结果,启动的调度延迟就只在1s左右,比原来的8s左右大大缩小。
父主题: Docker&K8s