开发者
我要评分
获取效率
正确性
完整性
易理解
在线提单
论坛求助

openEuler核隔离配置

openEuler核隔离介绍

在HPC场景下,由于每个线程会频繁进行同步,OS背景噪声对性能的影响会随着节点数增多逐步放大,OS背景噪声包括背景守护进程、外设中断、内核背景线程等。

增强核隔离特性将系统CPU分为housekeeping CPU和non-housekeeping CPU,将OS背景噪声集中在housekeeping CPU上,non-housekeeping CPU只运行业务计算任务,通过减少业务运行时背景噪声的干扰,提升业务性能。non-housekeeping CPU可以通过启动参数nohz_full和isolcpus指定,增加enhanced_isolcpus参数后可以进一步消除磁盘IO的噪声干扰。

核隔离配置步骤

  1. 修改grub启动项。

    vim /boot/efi/EFI/openEuler/grub.cfg

    找到关键字“/vmlinuz”所在行,在行末尾添加以下内容:

    irqaffinity=37,75,113,151,189,227,265,303,341,379,417,455,493,531,569,607 nohz_full=0-36,38-74,76-112,114-150,152-188,190-226,228-264,266-302,304-340,342-378,380-416,418-454,456-492,494-530,532-568,570-606 isolcpus=nohz,domain,managed_irq,0-36,38-74,76-112,114-150,152-188,190-226,228-264,266-302,304-340,342-378,380-416,418-454,456-492,494-530,532-568,570-606 rcu_nocbs=0-36,38-74,76-112,114-150,152-188,190-226,228-264,266-302,304-340,342-378,380-416,418-454,456-492,494-530,532-568,570-606  
    disable_sdei_nmi_watchdog enhanced_isolcpus

    当前鲲鹏920专业版上每个节点共2个CPU,每个CPU 8个NUMA,每个NUMA 38核,整机共608核,核隔离参数推荐将每个NUMA的最后1个核配置为housekeeping CPUhousekeeping CPU的优先级为“irqaffinity”,使内核进程/中断优先调度到housekeeping CPU上,减少系统对non-housekeeping CPU的影响。

    同时,因配置核隔离后,若程序使用MPI顺序绑核会导致部分计算进程在housekeeping CPU上运行,影响程序整体性能。在该配置下,MPI可以指定rankfile,显式指定进程绑定的核。

  2. 重启节点,使核隔离参数生效。
  3. 重启后执行 cat /proc/cmdline,确认参数是否添加成功。
  4. 执行初始化脚本,将系统中断/服务绑定到housekeeping CPU,并进行性能配置。
    #!/bin/bash 
     
    # 禁用自动优化Linux系统硬件中断CPU分配的服务,强制绑定到housekeeping CPU 
    systemctl stop irqbalance && systemctl mask irqbalance  
     
    #设置实时任务可以使用全部CPU时间 
    echo -1 > /proc/sys/kernel/sched_rt_runtime_us 
     
    #设置khugepaged 后台线程在合并页面时的碎片整理行为为不整理碎片  
    echo 0 >  /sys/kernel/mm/transparent_hugepage/khugepaged/defrag 
     
    #设置进程分配内存时的即时行为为从不整理碎片 
    echo never > /sys/kernel/mm/transparent_hugepage/defrag 
     
    #设置透明大页内存分配策略为系统全局开启 
    echo always > /sys/kernel/mm/transparent_hugepage/enabled 
     
    #禁用自动NUMA平衡 
    echo 0 > /proc/sys/kernel/numa_balancing  
     
    ps -aux | grep rcu_sched | grep -v 'grep' | awk  '{print $2}' > ~/.tmp  
     cat ~/.tmp | while read line; do taskset -pc 37,75,113,151,189,227,265,303,341,379,417,455,493,531,569,607 $line; done  
       
     ps -aux | grep kswapd | grep -v 'grep' | awk  '{print $2}' > ~/.tmp   
     cat ~/.tmp | while read line; do taskset -pc 37,75,113,151,189,227,265,303,341,379,417,455,493,531,569,607 $line; done   
       
     ps -aux | grep kcompactd  | grep -v 'grep' | awk  '{print $2}' > ~/.tmp   
     cat ~/.tmp | while read line; do taskset -pc 37,75,113,151,189,227,265,303,341,379,417,455,493,531,569,607 $line; done   
       
     ps -aux | grep rcuog | grep -v 'grep' | awk  '{print $2}' > ~/.tmp  
     cat ~/.tmp | while read line; do taskset -pc 37,75,113,151,189,227,265,303,341,379,417,455,493,531,569,607 $line; done  
       
     ps -aux | grep rcuos | grep -v 'grep' | awk  '{print $2}' > ~/.tmp  
     cat ~/.tmp | while read line; do taskset -pc 37,75,113,151,189,227,265,303,341,379,417,455,493,531,569,607 $line; done 
     
    # 配置所有核的模式为performance 
    for core_id in 0-607 
    do  
        echo performance > /sys/devices/system/cpu/cpufreq/policy${core_id}/scaling_governor  
    done