Rate This Document

Findability

Accuracy

Completeness

Readability

DTB

Case No.	4.2.22
Test Objective	Decision Tree Bucket (DTB) algorithm performance test
Test Networking	Figure 1 shows the test networking.
Prerequisites	The cluster has been deployed based on the test network diagram. The test sample tool kal-test package for the algorithm has been obtained. For details about the sample project directory structure, see the README file. Auxiliary code is required in the test process. The dataset used by the algorithm has been uploaded to the specified HDFS directory. For details, see Test Dataset.
Test Procedure	Save the kal-test folder to a specified directory, for example, /home/test/boostkit/. Create such a directory if the directory does not exist. mkdir -p /home/test/boostkit/ Compile and install the software. For details, see Software Compiling and Software Deployment in the Kunpeng BoostKit for Big Data Machine Learning Algorithm Library Feature Guide. Save the obtained boostkit-ml-kernel-scala_version-kal_version-spark_version-aarch64.jar, boostkit-ml-acc_scala_version-kal_version-spark_version.jar, and boostkit-ml-core_scala_version-kal_version-spark_version.jar files to the /home/test/boostkit/kal-test/lib directory. Go to the /home/test/boostkit/kal-test directory. cd /home/test/boostkit/kal-test View the node names in /etc/hosts. As shown in the following figure, the compute nodes are agent1, agent2, and agent3. cat /etc/hosts Based on the compute node names obtained in 4, rename the compute nodes in bin/ml/dtb_run.sh as follows. Open the bin/ml/dtb_run.sh file. vim bin/ml/dtb_run.sh Press i to enter the insert mode. Rename the compute nodes in the red box to agent1, agent2, and agent3. If the number of compute nodes is not 3, add or delete rows accordingly. Press Esc, type :wq!, and press Enter to save the file and exit. Create a path for saving the results. mkdir logs report Execute the test script, for example, test the algorithm performance in the HIGGS dataset (using the function interface fit). sh bin/ml/dtb_run.sh higgs fit save no 2>&1 \| tee -a logs/dtb_higgs_fit.log After the execution is complete, you can view data such as the execution duration and result path in the /home/test/boostkit/kal-test/report/Algorithm name_File write time.yml file. In the command output, costTime indicates the algorithm execution duration and bucketedResPath indicates the HDFS path for saving the result. cat report/Algorithm name_File write time.yml
Expected Result	The script is executed successfully. The report/Algorithm name_File write time.yml file is generated and the file contains the result information.
Test Result
Remarks	If the directory name or location is different, modify it in the script. The optimal parameters submitted by Spark may vary in different clusters. You need to search for the optimal parameters. You can modify model parameters in conf/ml/dtb/dtb.yml and modify Spark running parameters in conf/ml/dtb/dtb_spark.properties.

Parent topic: Machine Learning Algorithms