鲲鹏社区首页
中文
注册
我要评分
文档获取效率
文档正确性
内容完整性
文档易理解
在线提单
论坛求助

大规模拉起容器场景延迟较高

问题现象描述

在进行k8s双工作节点并发调度100个容器的测试时,鲲鹏平均容器拉起延迟时间远大于友商x86平均延迟时间,在拆分部署的流程时发现调度环节耗时占整体延迟近90%,存在阻塞。

关键过程、根本原因分析

测试时用到的脚本文件内容如下:
#!/bin/bash

total=100

echo "start delete old pods."
kubectl get jobs | grep "yanzu-test" | awk '{print $1}' | grep -v NAME | xargs kubectl delete job > /dev/null 2>&1
kubectl delete events --all
echo "delete old pods finish"

echo "staring..."
start=$(date +%s)
echo $start
date
declare -i i=1
while ((i<=$total))
do
    echo "start" $i
    kubectl create -f run.yaml &
    let ++i
done


echo "wait for 30s"
sleep 30s
completed=`kubectl get pods | grep "yanzu-test" | awk '{print $3}' | grep -v Completed`
while [ -n "$completed" ]
do
    echo "waiting for complete:" $completed
    sleep 3
    completed=`kubectl get pods | grep "yanzu-test" | awk '{print $3}' | grep -v Completed`
done

echo "finish create jobs"

declare -i sum=0 max=0 sum_sch=0 sum_cre=0 sum_sta=0 max_sch=0 max_cre=0 max_sta=0 sum_pul=0 max_pul=0

pods=`kubectl get pods | grep "yanzu-test" | awk '{print $1}'`
for pod in ${pods[@]}
do
    timet=`kubectl describe pod $pod | grep "Started:" | awk '{print $6}'`
    p_timet=$(date --date $timet)
    # echo "Pod $pod start time is $p_timet"
    mils=$(date +%s --date $timet)
    used=$((mils-start))
    if((used>max))
    then
      max=used
    fi
    sch_time=`kubectl get events -o=custom-columns=LastSeen:.metadata.creationTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Scheduled" | awk '{print $1}'`
    pul_time=`kubectl get events -o=custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Pulled" | awk '{print $1}'`
    cre_time=`kubectl get events -o=custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Created" | awk '{print $1}'`
    sta_time=`kubectl get events -o=custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason | grep "$pod" | grep "Started" | awk '{print $1}'`
    sch_time_t=$(date +%s --date $sch_time)
    sch_used=$((sch_time_t-start))
    pul_time_t=$(date +%s --date $pul_time)
    pul_used=$((pul_time_t-sch_time_t))
    cre_time_t=$(date +%s --date $cre_time)
    cre_used=$((cre_time_t-pul_time_t))
    sta_time_t=$(date +%s --date $sta_time)
    sta_used=$((sta_time_t-cre_time_t))
    if((sch_used>max_sch))
    then
      max_sch=sch_used
    fi
    if((pul_used>max_pul))
    then
      max_pul=pul_used
    fi
    if((cre_used>max_cre))
    then
      max_cre=cre_used
    fi
    if((sta_used>max_sta))
    then
      max_sta=sta_used
    fi
    sum=$((sum+used))
    sum_sch=$((sum_sch+sch_used))
    sum_pul=$((sum_pul+pul_used))
    sum_cre=$((sum_cre+cre_used))
    sum_sta=$((sum_sta+sta_used))
done

echo "Avg Scheduled Time: $((sum_sch/100))"
echo "Avg Pulled Time: $((sum_pul/100))"
echo "Avg Created Time: $((sum_cre/100))"
echo "Avg Started Time: $((sum_sta/100))"
echo "Avg Time: $((sum/100))"
echo "Max Scheduled Time: $max_sch"
echo "Max Pulled Time: $max_pul"
echo "Max Created Time: $max_cre"
echo "Max Started Time: $max_sta"
echo "Max Time: $max"

echo "finish test"
对应的run.yaml文件内容如下:
apiVersion: batch/v1
kind: Job
metadata:
  generateName: yanzu-test-
spec:
  completions: 1
  parallelism: 1
  backoffLimit: 0
  activeDeadlineSeconds: 3600
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: my-container
        image: openjdk:8u292-jdk
        resources:
          requests:
            memory: "1Gi"
            cpu: "1000m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        command: [ "/bin/bash", "-c", "sleep 21s"]

具体步骤如下:

  1. for循环内执行kubectl create -f job.yaml 100次,来并发创建100个jobs部署pods拉起容器。
  2. 通过脚本计算,发现调度时间占比极高。
    • 容器拉起时间=容器启动时间-命令输入时间
    • pod调度时间=pod调度成功时间-命令输入时间
  3. 通过kubectl logs命令查看controller-manager日志,发现Throttling request信息。
    图1 controller-manager日志

    在调度环节,k8s需要通过controller-manger并发创建100个jobs对象和对应pods对象,再通过scheduler将100个pods调度到合适工作节点上。这一过程中,controller-manager和scheduler需要频繁向apiserver发送http请求。而这些组件每秒api通信请求数和每秒api请求突发峰值分别由kube-api-qps(默认50)和kube-api-burst(默认100)控制。当阈值较小时,超过阈值数量的api通信请求会被限流,成为Throttling request,导致响应变慢,进而影响创建pods和调度速度,导致延迟增大。

结论、解决方案及效果

在初始化集群前配置相关参数,具体的配置步骤如下:

  1. 通过kubeadm config命令生成配置文件init-config.yaml。
    kubeadm config print init-defaults > init-config.yaml
  2. 编辑init-config.yaml文件,添加对应的启动参数,这里以kubeadm-1.23.1-0.aarch64版本下修改kube-controller-manager的启动参数为例进行介绍。
    1. 打开init-config.yaml文件。
      vim init-config.yaml
    2. 修改配置项如下。
      在”controllerManager”下添加参数“kube-apiserver-burst”“kube-apiserver-qps”,调节api请求的最大限制数。
      controllerManager:
        extraArgs:
          kube-apiserver-burst: "400"
          kube-apiserver-qps: "600"
    3. 通过kubeadm init初始化集群。
      kubeadm init --config=init-config.yaml
  3. 修改后重新运行脚本测试,得到图2中结果,启动的调度延迟就只在1s左右,比原来的8s左右大大缩小。
    图2 测试结果