绑定单NUMA限制

为调优MySQL的性能,可将计算节点的MySQL Pod限制在单个NUMA节点上运行。K8s提供了2种CPU分配策略:CPU Manager策略和Topology Manager策略。

绑定单NUMA限制:计算节点分别执行14

解除单NUMA限制:同样在各个计算节点上执行14,但是2改为将配置文件内容修改为原先默认的内容。

  1. 确认K8s版本和实际绑核需求。

    执行kubectl version命令查看Kubernetes(K8s)版本信息
    • Kubernetes Version >= 1.16,该版本CPU Manager策略和Topology Manager策略均可用。
    • Kubernetes Version >= 1.12,该版本仅能使用CPU Manager策略。

      绑核将会进行CPU独占,请事先确定是否一定需要进行独占。

  2. 修改Kubelet配置文件。

    1. 打开配置文件
      1
      vim /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
      
    2. 原文件默认内容如下:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      # Note: This dropin only works with kubeadm and kubelet v1.11+
      [Service]
      Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
      Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
      # This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
      EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
      # This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
      # the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
      EnvironmentFile=-/etc/sysconfig/kubelet
       
      ExecStart=
      ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
      
      “i”进入编辑模式,修改后为:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      # Note: This dropin only works with kubeadm and kubelet v1.11+
      [Service]
      Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
      Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
      # This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
      EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
      # This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
      # the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
      EnvironmentFile=-/etc/sysconfig/kubelet
      
      # 修改1 增加两行ExecStartPre配置
      ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/cpuset/system.slice/kubelet.service
      ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/hugetlb/system.slice/kubelet.service
       
      ExecStart=
      
      # 修改2 在ExecStart配置末尾增加--kube-reserved、--cpu-manager-policy、--feature-gates、--topology-manager-policy等参数
      ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS --kube-reserved=cpu=2,memory=250Mi --cpu-manager-policy=static --feature-gates=CPUManager=true --topology-manager-policy=single-numa-node
      
      • CPU Manager策略:--feature-gates=CPUManager=true,保证yaml文件中mysql-1、mysql-2和mysql-3的limits限制的CPU核数连续分配。
      • Topology Manager策略:--topology-manager-policy=single-numa-node,保证yaml文件中mysql-1、mysql-2和mysql-3的limits限制的CPU核数绑定在单NUMA,若需要CPU核数绑定在单NUMA,则yaml文件中limits限制的CPU核数必须小于等于单NUMA的CPU核数(可以执行lscpu或者numactl -H查看各个NUMA的CPU核数),否则在主节点K8s创建部署MySQL Pod后执行watch kubectl get pod -n ns-mysql-test -o wide查看会发现Pod创建失败。
      • 若实际场景:创建的MySQL Pod的CPU核数大于单NUMA的CPU核数而且小于等于1P的CPU核数(本文中1P对应2个NUMA),而且要求NUMA节点不能跨P。则删除--topology-manager-policy=single-numa-node,并在主节点修改yaml文件删除limits资源限制,主节点执行K8s创建部署MySQL Pod后,在计算节点上通过taskset -pac手动把mysql进程以及线程绑核到0-47(NUMA node0和NUMA node1),具体绑核操作如下:
        1. 查看MySQL进程ID。
          1
          ps -ef | grep mysql
          
        2. 查看MySQL绑在哪些CPU核上。
          1
          taskset -pac mysql进程ID
          
        3. 把MySQL进程以及线程绑核到0-47(NUMA node0和NUMA node1)。
          1
          taskset -pac 0-47 mysql进程ID
          
        4. 查看MySQL绑在哪些CPU核上。
          1
          taskset -pac mysql进程ID
          
    3. “Esc”键,输入:wq!,按“Enter”保存并退出编辑。

  3. 删除CPU管理状态文件cpu_manager_state。

    1
    rm -f /var/lib/kubelet/cpu_manager_state
    

  4. 重启Kubelet服务。

    1
    systemctl daemon-reload && systemctl restart kubelet
    

    查看Kubelet状态。

    1
    systemctl status kubelet
    

  5. 修改mysql_deployment.yaml配置文件。

    根据规划部署的node节点的CPU、内存实际情况选择合适的CPU、内存配置值,例如本文物理机上有4个NUMA,1P含有2个NUMA,每个NUMA含有24核CPU,CPU核数配置如下:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    ......
    spec:
      nodeSelector:
        test: "mysql-test-1"
      containers:
      - name: mysql-1
        image: mymysql/centos8-mysql-arm:8.0.19
        resources:
          limits:
            cpu: 16
            memory: 64Gi
        ......
    ---
    ......
    spec:
      nodeSelector:
        test: "mysql-test-2"
      containers:
      - name: mysql-2
        image: mymysql/centos8-mysql-arm:8.0.19
        resources:
          limits:
            cpu: 16
            memory: 64Gi
        ......
    ---
    ......
    spec:
      nodeSelector:
        test: "mysql-test-3"
      containers:
      - name: mysql-3
        image: mymysql/centos8-mysql-arm:8.0.19
        resources:
          limits:
            cpu: 16
            memory: 64Gi
        ......
    

    要使Pod能生效single-numa-node模式的功能,必须要将Pod的resources中的CPU和memory显示配置出来,且resources中limits的配置要的requests的配置相等(即Guaranteed Pod),本文中省略了requests的配置,即会使request默认等于limits的配置。

  6. 在master节点上重新部署mysql_deployment.yaml。

    1
    2
    kubectl delete -f ./mysql_deployment.yaml
    kubectl create -f ./mysql_deployment.yaml
    

  7. 查看Pod与NUMA的使用情况。

    1
    docker ps -a | grep mysql
    
    1
    2
    bcc93653c574        48858e629fa6           "/entrypoint.sh mysq…"   31 minutes ago      Up 31 minutes                                 k8s_mysql-2_mysql-2_ns-mysql-test_605956f2-1e13-49c6-a197-6220915130bc_0
    ea9afa2c2104        k8s.gcr.io/pause:3.2   "/pause"                 32 minutes ago      Up 32 minutes                                 k8s_POD_mysql-2_ns-mysql-test_605956f2-1e13-49c6-a197-6220915130bc_0
    
    1
    docker inspect bcc93653c574 | grep Cpuset
    
    1
    2
                "CpusetCpus": "2-17",
                "CpusetMems": "",
    
    1
    lscpu
    
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    Architecture:          aarch64
    Byte Order:            Little Endian
    CPU(s):                96
    On-line CPU(s) list:   0-95
    Thread(s) per core:    1
    Core(s) per socket:    48
    Socket(s):             2
    NUMA node(s):          4
    Model:                 0
    CPU max MHz:           2600.0000
    CPU min MHz:           200.0000
    BogoMIPS:              200.00
    L1d cache:             64K
    L1i cache:             64K
    L2 cache:              512K
    L3 cache:              49152K
    NUMA node0 CPU(s):     0-23
    NUMA node1 CPU(s):     24-47
    NUMA node2 CPU(s):     48-71
    NUMA node3 CPU(s):     72-95
    Flags:                 fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop
    

    可见mysql-1已被限制在NUMA node0上运行。