Software Update

Do as follows to update the algorithm library.

Obtain the latest packages (for example, version 1.3.0) by referring to Obtaining Code.

Log in to the server as an authorized user of the big data component, and replace the algorithm library packages in the /home/test/boostkit/lib/ directory of the client with the latest one.

rm -f /home/test/boostkit/lib/boostkit-*
cp /opt/Spark-ml-algo-lib-1.3.0-spark2.3.2/ml-core/target/boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar /home/test/boostkit/lib
cp /opt/Spark-ml-algo-lib-1.3.0-spark2.3.2/ml-accelerator/target/boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar /home/test/boostkit/lib
cp /opt/Spark-ml-algo-lib-1.3.0-spark2.3.2/ml-xgboost/jvm-packages/boostkit-xgboost4j/target/boostkit-xgboost4j_2.11-1.3.0.jar /home/test/boostkit/lib
cp /opt/Spark-ml-algo-lib-1.3.0-spark2.3.2/ml-xgboost/jvm-packages/boostkit-xgboost4j-spark/target/boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar /home/test/boostkit/lib
cp /opt/boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar /home/test/boostkit/lib
cp /opt/boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar /home/test/boostkit/lib
cp /opt/libboostkit_xgboost_kernel.so /home/test/boostkit/lib

Replace the algorithm library package names in the shell script for task submission with the names of the latest packages, and start the Spark job in yarn-client mode. An example of a modified shell script is as follows:

#!/bin/bash

spark-submit \
--class com.bigdata.ml.RFMain \
--master yarn \
--deploy-mode client \
--driver-cores 36 \
--driver-memory 50g \
--jars "lib/fastutil-8.3.1.jar,lib/boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar,lib/boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar,lib/boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar" \
--conf "spark.executor.extraClassPath=fastutil-8.3.1.jar:boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar" \
--driver-class-path "lib/ml-test.jar:lib/fastutil-8.3.1.jar:lib/snakeyaml-1.17.jar:lib/boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar:lib/boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar:lib/boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar"
./ml-test.jar 

Place the shell script for task submission in the /home/test/boostkit/ directory where the test JAR file is stored, and start the Spark job in yarn-cluster mode. An example of the modified shell script content is as follows:

#!/bin/bash

spark-submit \
--class com.bigdata.ml.RFMain \
--master yarn \
--deploy-mode cluster \
--driver-cores 36 \
--driver-memory 50g \
--jars "lib/fastutil-8.3.1.jar,lib/boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar,lib/boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar,lib/boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar,lib/boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar" \
--driver-class-path "ml-test.jar:fastutil-8.3.1.jar:snakeyaml-1.17.jar:boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar" \
--conf "spark.yarn.cluster.driver.extraClassPath=ml-test.jar:snakeyaml-1.17.jar:boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar:boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar" \
--conf "spark.executor.extraClassPath=fastutil-8.3.1.jar:boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar" \
./ml-test.jar 

The XGBoost algorithm involves some C++ code. Therefore, the parameters for the XGBoost algorithm are slightly different from those for other algorithms. The preceding script can be used to submit jobs of other algorithms except XGBoost.

To run the XGBoost algorithm, start Spark jobs in yarn-client mode. An example of the modified shell script is as follows:

#!/bin/bash

spark-submit \
--class com.bigdata.ml.XGBTRunner\
--master yarn \
--deploy-mode client \
--driver-cores 36 \
--driver-memory 50g \
--jars "lib/boostkit-xgboost4j-spark-kernel_2.11-1.3.0-aarch_64.jar,lib/boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar,lib/boostkit-xgboost4j_2.11-1.3.0.jar" \
--conf "spark.executor.extraClassPath=boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar:boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar:boostkit-xgboost4j_2.11-1.3.0.jar" \
--driver-class-path "lib/ml-test.jar:lib/boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar:lib/boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar:lib/boostkit-xgboost4j_2.11-1.3.0.jar"
--conf spark.executorEnv.LD_LIBRARY_PATH="./lib/:${LD_LIBRARY_PATH}" \
--conf spark.executor.extraLibraryPath="./lib" \
--conf spark.driver.extraLibraryPath="./lib" \
--files=lib/libboostkit_xgboost_kernel.so  \
./ml-test.jar 

To run the XGBoost algorithm, start Spark jobs in yarn-cluster mode. An example of the modified shell script is as follows:

#!/bin/bash

spark-submit \
--class com.bigdata.ml.XGBTRunner\
--master yarn \
--deploy-mode cluster \
--driver-cores 36 \
--driver-memory 50g \
--jars "lib/boostkit-xgboost4j-spark-kernel_2.11-1.3.0-aarch_64.jar,lib/boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar,lib/boostkit-xgboost4j_2.11-1.3.0.jar" \
--conf "spark.executor.extraClassPath=boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar:boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar:boostkit-xgboost4j_2.11-1.3.0.jar" \
--driver-class-path "ml-test.jar:boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar:boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar:boostkit-xgboost4j_2.11-1.3.0.jar"
--conf "spark.yarn.cluster.driver.extraClassPath=ml-test.jar:boostkit-xgboost4j-kernel-2.11-1.3.0-spark2.3.2-aarch64.jar:boostkit-xgboost4j-spark2.3.2_2.11-1.3.0.jar:boostkit-xgboost4j_2.11-1.3.0.jar"
--conf spark.executorEnv.LD_LIBRARY_PATH="./lib/:${LD_LIBRARY_PATH}" \
--conf spark.executor.extraLibraryPath="./lib" \
--conf spark.driver.extraLibraryPath="./lib" \
--files=lib/libboostkit_xgboost_kernel.so  \
./ml-test.jar 

kunpengbdsspark_06_0013.html#EN-US_TOPIC_0000001501041117__table286171124012 describes the statements in the script.

Parent topic: Installing Software