SVM

Case No.

4.2.1

Test Objective

Support vector machine (SVM) performance test

Test Networking

Figure 1 shows the test network.

Prerequisites

The cluster has been deployed based on the test network diagram.
The test sample tool kal-test package for the algorithm has been obtained. For details about the sample project directory structure, see the README file. This test framework will be used in the test process.
The dataset used by the algorithm has been uploaded to the specified HDFS directory. For details, see Test Dataset.

Test Procedure

Go to the /home/test/boostkit/kal-test directory.

            
                 cd /home/test/boostkit/kal-test

View the node names in /etc/hosts. As shown in the following figure, the compute nodes are agent1, agent2, and agent3.

            
                 cat /etc/hosts

Based on the compute node names obtained in 2, rename the compute nodes in bin/ml/svm_run.sh as follows.

Open the bin/ml/svm_run.sh file.

              
                   vi bin/ml/svm_run.sh

Press i to enter the insert mode. Rename the compute nodes in the red box to agent1, agent2, and agent3. If the number of compute nodes is not 3, add or delete rows accordingly.
Press Esc, type :wq!, and press Enter to save the file and exit.

Create a path for saving the results.

            
                 mkdir logs report

Execute the test script, for example, test the algorithm performance in the D10M4096 dataset.

            
                 sh bin/ml/svm_run.sh D10M4096 fit no no 2>&1 | tee -a logs/svm_10M4096_fit.log

After the execution is complete, you can view data such as the execution duration and result path in the /home/test/boostkit/kal-test/report/Algorithm name_File write time.yml file. In the command output, costTime indicates the algorithm execution duration and saveDataPath indicates the HDFS path for saving the result.
```
cat report/Algorithm name_File write time.yml
```

Expected Result

The script is executed successfully.
The report/Algorithm name_File write time.yml file is generated and the file contains the result information.

Test Result

Remarks

If the directory name or location is different, modify it in the script.
The optimal parameters submitted by Spark may vary in different clusters. You need to search for the optimal parameters. You can modify model parameters in conf/ml/svm/svm.yml and modify Spark running parameters in conf/ml/svm/svm_spark.properties.

Parent topic: Machine Learning Algorithms Library