鲲鹏社区首页
中文
注册
开发者
PyTorch使能HPCKit进行编译安装

PyTorch使能HPCKit进行编译安装

HPC

发表于 2025/09/23

0

众所周知,PyTorch除了可以直接下载软件包进行安装外,为获取最优性能往往使用源码进行编译安装,其性能强依赖于编译器、数学库等软件的性能。在鲲鹏平台上,使用HPCKit进行编译安装,通过内置毕昇编译器、鲲鹏数学库(Kunpeng Math Library,KML)加速PyTorch,以获得在鲲鹏处理器上的最优性能。

1、环境信息

硬件环境:鲲鹏服务器

操作系统 :openEuler 22.03 LTS SP4/SP3/SP2、openEuler 20.03 LTS SP3、麒麟V10 SP2/HPC版、麒麟信安V3.5.2 /V3.5.3等操作系统

HPCKit版本:25.1.RC1及以上

2、前置环境配置

2.1 HPCKit安装

HPCKit安装包下载链接:https://www.hikunpeng.com/developer/hpc/hpckit-download

安装过程参考HPCKit文档:https://www.hikunpeng.com/document/detail/zh/kunpenghpcs/hpckit/instg/KunpengHPCKit_install_002.html

2.2、PyTorch源码下载

在github上下载Pytorch:

git clone -b v2.5.0 --depth=1 --recursive https://github.com/pytorch/pytorch.git
cd pytorch
git submodule update --init --recursive

3、Pytorch链接KML步骤

3.1 PyTorch使能毕昇编译器

毕昇编译器升级4.0.0以后对代码要求变严格,原版的Pytorch脚本中部分代码书写不规范,新版LLVM会导致报错,因此需要将非ASCII字符清理掉,并避免编译告警。

在tools/setup_helpers/cmake.py文件的def get_cmake_cache_variables(self) -> dict[str, CMakeValue]函数中添加以下代码:

with open(self._cmake_cache_file, 'rb') as f:
    content = f.read()
    content = bytes(i for i in content if i < 128)
with open(self._cmake_cache_file, 'wb') as f:
    f.write(content) 

在tools/setup_helpers/env.py的def __init__(self, cmake_build_type_env: str | None = None) -> None函数里,针对if os.path.isfile(cmake_cache_txt)分支判断逻辑下添加以下代码:

with open(cmake_cache_txt, 'rb') as f:
    content = f.read()
    content = bytes(i for i in content if i < 128)
with open(cmake_cache_txt, 'wb') as f: 
    f.write(content) 

3.2、Pytorch使能KML

3.2.1、kblas替换openblas

1. 在cmake/Modules/新建FindKML.cmake文件,文件内容如下:

SET(KML_BLAS_INCLUDE_SEARCH_PATHS $ENV{KMLROOT}/include) 
SET(KML_BLAS_LIB_SEARCH_PATHS $ENV{KMLROOT}/lib/sme/kblas/multi/)
 
FIND_PATH(KML_INCLUDE_DIR NAMES kblas.h PATHS ${KML_BLAS_INCLUDE_SEARCH_PATHS})
FIND_LIBRARY(KML_LIB NAMES kblas PATHS ${KML_BLAS_LIB_SEARCH_PATHS})

SET(KML_FOUND ON) 
# Check include files
IF(NOT KML_INCLUDE_DIR) 
    SET(KML_FOUND OFF)
    MESSAGE(STATUS "Could not find KBLAS include. Turning KBLAS_FOUND off")
ENDIF() 

# Check libraries
IF(NOT KML_LIB)
    SET(KML_FOUND OFF)
    MESSAGE(STATUS "Could not find KBLAS lib. Turning KBLAS_FOUND off")
ENDIF()

IF(KML_FOUND)
    IF(NOT KML_FIND_QUIETLY)
        MESSAGE(STATUS "Found KBLAS libraries: ${KML_LIB}")
        MESSAGE(STATUS "Found KBLAS include: ${KML_INCLUDE_DIR}")
    ENDIF(NOT KML_FIND_QUIETLY)
ELSE(KML_FOUND)
    IF(KML_FIND_REQUIRED)
        MESSAGE(FATAL_ERROR "Could not find KBLAS")
    ENDIF(KML_FIND_REQUIRED)
ENDIF(KML_FOUND)

MARK_AS_ADVANCED(KML_INCLUDE_DIR KML_LIB KBLAS) 

2. 在cmake/Dependencies.cmake文件中添加BLAS对KML引用的索引,在if (BLAS STREQUAL “Eigen”)分支判断逻辑下添加以下代码:

ELSEIF(BLAS STREQUAL "KML")
    FIND_PACKAGE(KML REQUIRED)
    INCLUDE_DIRECTORIES(SYSTEM ${KML_INCLUDE_DIR})
    LIST(APPEND Caffe2_DEPENDENCY_LIBS ${KML_LIB})
    SET(BLAS_INFO "KML")
    SET(BLAS_FOUND 1)
    SET(BLAS_LIBRARIES ${KML_LIB})

3.2.2、klapack替换lapack

在cmake/Modules/FindLAPACK.cmake中添加新KML后端,添加以下代码:

IF((NOT LAPACK_INFO) AND (BLAS_INFO STREQUAL "kml"))
    SET(_kml_include $ENV{KML_LAPACK_INCLUDE})
    SET(_kml_lib $ENV{KML_LAPACK_LIB})
    FIND_PATH(LAPACK_INCLUDE_DIR NAMES lapacke.h HINTS ${_kml_include} )
    FIND_LIBRARY(LAPACK_LIBRARY NAMES klapack_full km HINTS ${_kml_lib} ${_kml_lib}/../noarch/ )
ENDIF()

IF(LAPACK_INCLUDE_DIR AND LAPACK_LIBRARY) 
    SET(LAPACK_LIBRARIES ${LAPACK_LIBRARY}) 
    SET(LAPACK_INFO "kml")
    SET(LAPACK_FOUND TRUE) 
    MESSAGE(STATUS "Found kml LAPACK at ${_kml_lib}") 
ENDIF() 

4、Pytorch编译步骤

4.1、加载HPCKit环境

假设HPCKit安装路径为/path-to-HPCKit/,执行下述命令进行加载环境变量:

source /path-to-HPCKit/HPCKit/latest/modulefiles/kml/bisheng/env/setvars.sh
source /path-to-HPCKit/HPCKit/latest/modulefiles/compiler/bisheng/env/setvars.sh

4.2、下载PyTorch依赖项

pip install -r requirements.txt

4.3、编译PyTorch 编译指令如下:

SVE_VECBITS=512 DEBUG=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=1 \
USE_QNNPACK=0 USE_XNNPACK=0 USE_HBM=0 BLAS=OpenBLAS USE_LAPACK=1 \
KML_LAPACK_LIB=/path/to/HPCKit/latest/kml/bisheng/lib/sve512/ \
KML_LAPACK_INCLUDE=/path/to/HPCKit/latest/kml/bisheng/include python \
./setup.py develop

5、Pytorch性能测试

5.1、新增测试文件linear_test.py,代码如下

import torch
import time

def measure_cpu_matmul_time(size: int = 9728, warmup: int = 3, repeats: int = 10):
    # 在 CPU 上测量 size x size 矩阵乘法的耗时(秒)。
    # 先进行若干次预热(不计入时间),然后重复多次取平均。
    # 在 CPU 上生成随机张量
    A = torch.randn(size, size)
    B = torch.randn(size, size)
    # 预热
    for _ in range(warmup):
        _ = torch.mm(A, B)
    # 测量
    times = []
    for _ in range(repeats):
        t0 = time.perf_counter()
        _ = torch.mm(A, B)
        t1 = time.perf_counter()
        times.append(t1 - t0)
        avg_time = sum(times) / len(times)
        print(f"[CPU] 矩阵 {size}×{size} 相乘平均耗时:{avg_time:.6f} 秒 (over {repeats} runs, warmup={warmup})")

if __name__ == "__main__":
    measure_cpu_matmul_time()

5.2、性能测试流程:

开启大页内存:

for i in $(seq 16 31); do echo 2044 > /sys/devices/system/node/node${i}/hugepages/hugepages-2048kB/nr_hugepages; done 

设置KML后端:

export KML_BLAS_THREAD_TYPE=OMP 

限制线程数:

export OMP_NUM_THREADS=38 

执行测试程序:

taskset -c 0-37 numactl -m 16 python linear_test.py 

执行结果:

[CPU] 矩阵8192×8192 相乘平均耗时: 0.2189695秒 (over 10 runs, warmup=3)

至此,Pytorch使能HPCKit编译安装就完成了,大家可以对比软件包安装的版本进行性能对比测试。

本页内容