鲲鹏社区首页
中文
注册
我要评分
文档获取效率
文档正确性
内容完整性
文档易理解
在线提单
论坛求助

软件升级

算法库更新版本后,可以按照以下流程进行版本升级。

  1. 按照获取代码的说明获取最新的软件包(1.3.0版本)。
  2. 以大数据组件的授权用户登录服务器,将原来客户端的“/home/test/boostkit/lib/”目录下的算法库软件包,替换成最新版本。
    1
    2
    3
    4
    5
    6
    7
    8
    rm -f /home/test/boostkit/lib/boostkit-*
    cp /opt/Spark-ml-algo-lib-1.3.0-spark2.3.2/ml-core/target/boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar /home/test/boostkit/lib
    cp /opt/Spark-ml-algo-lib-1.3.0-spark2.3.2/ml-accelerator/target/boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar /home/test/boostkit/lib
    cp /opt/Spark-ml-algo-lib-1.3.0-spark2.3.2/ml-xgboost/jvm-packages/boostkit-xgboost4j/target/boostkit-xgboost4j_2.11-1.3.0.jar /home/test/boostkit/lib
    cp /opt/Spark-ml-algo-lib-1.3.0-spark2.3.2/ml-xgboost/jvm-packages/boostkit-xgboost4j-spark/target/boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar /home/test/boostkit/lib
    cp /opt/boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar /home/test/boostkit/lib
    cp /opt/boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar /home/test/boostkit/lib
    cp /opt/libboostkit_xgboost_kernel.so /home/test/boostkit/lib
    
  3. 将提交任务shell脚本中算法库软件包名替换成最新版本的包名,使用yarn-client模式启动Spark作业,替换后的shell脚本示例如下。
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    #!/bin/bash
    
    spark-submit \
    --class com.bigdata.ml.RFMain \
    --master yarn \
    --deploy-mode client \
    --driver-cores 36 \
    --driver-memory 50g \
    --jars "lib/fastutil-8.3.1.jar,lib/boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar,lib/boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar,lib/boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar" \
    --conf "spark.executor.extraClassPath=fastutil-8.3.1.jar:boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar" \
    --driver-class-path "lib/ml-test.jar:lib/fastutil-8.3.1.jar:lib/snakeyaml-1.17.jar:lib/boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar:lib/boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar:lib/boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar"
    ./ml-test.jar 
    

    将提交任务shell脚本放入客户端与测试jar包相同目录的“/home/test/boostkit/”下,使用yarn-cluster模式启动Spark作业,替换后的shell脚本内容示例如下:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    #!/bin/bash
    
    spark-submit \
    --class com.bigdata.ml.RFMain \
    --master yarn \
    --deploy-mode cluster \
    --driver-cores 36 \
    --driver-memory 50g \
    --jars "lib/fastutil-8.3.1.jar,lib/boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar,lib/boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar,lib/boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar,lib/boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar" \
    --driver-class-path "ml-test.jar:fastutil-8.3.1.jar:snakeyaml-1.17.jar:boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar" \
    --conf "spark.yarn.cluster.driver.extraClassPath=ml-test.jar:snakeyaml-1.17.jar:boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar:boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar" \
    --conf "spark.executor.extraClassPath=fastutil-8.3.1.jar:boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar" \
    ./ml-test.jar 
    

    由于xgboost有部分C++代码所以xgboost算法的提交参数与其他算法略有不同,以上脚本可以提交除xgboost外的其他算法作业。

    若运行xgboost算法,使用yarn-client模式启动Spark作业,替换后的shell脚本内容示例如下:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    #!/bin/bash
    
    spark-submit \
    --class com.bigdata.ml.XGBTRunner\
    --master yarn \
    --deploy-mode client \
    --driver-cores 36 \
    --driver-memory 50g \
    --jars "lib/boostkit-xgboost4j-spark-kernel_2.11-1.3.0-aarch_64.jar,lib/boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar,lib/boostkit-xgboost4j_2.11-1.3.0.jar" \
    --conf "spark.executor.extraClassPath=boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar:boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar:boostkit-xgboost4j_2.11-1.3.0.jar" \
    --driver-class-path "lib/ml-test.jar:lib/boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar:lib/boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar:lib/boostkit-xgboost4j_2.11-1.3.0.jar"
    --conf spark.executorEnv.LD_LIBRARY_PATH="./lib/:${LD_LIBRARY_PATH}" \
    --conf spark.executor.extraLibraryPath="./lib" \
    --conf spark.driver.extraLibraryPath="./lib" \
    --files=lib/libboostkit_xgboost_kernel.so  \
    ./ml-test.jar 
    

    若运行xgboost算法,使用yarn-cluster模式启动Spark作业,替换后的shell脚本内容示例如下:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    #!/bin/bash
    
    spark-submit \
    --class com.bigdata.ml.XGBTRunner\
    --master yarn \
    --deploy-mode cluster \
    --driver-cores 36 \
    --driver-memory 50g \
    --jars "lib/boostkit-xgboost4j-spark-kernel_2.11-1.3.0-aarch_64.jar,lib/boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar,lib/boostkit-xgboost4j_2.11-1.3.0.jar" \
    --conf "spark.executor.extraClassPath=boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar:boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar:boostkit-xgboost4j_2.11-1.3.0.jar" \
    --driver-class-path "ml-test.jar:boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar:boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar:boostkit-xgboost4j_2.11-1.3.0.jar"
    --conf "spark.yarn.cluster.driver.extraClassPath=ml-test.jar:boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar:boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar:boostkit-xgboost4j_2.11-1.3.0.jar"
    --conf spark.executorEnv.LD_LIBRARY_PATH="./lib/:${LD_LIBRARY_PATH}" \
    --conf spark.executor.extraLibraryPath="./lib" \
    --conf spark.driver.extraLibraryPath="./lib" \
    --files=lib/libboostkit_xgboost_kernel.so  \
    ./ml-test.jar 
    

    脚本中的语句含义如表2所示。