Rate This Document
Findability
Accuracy
Completeness
Readability

SVM

Case No.

4.2.1

Test Objective

Support vector machine (SVM) performance test

Test Networking

Figure 1 shows the test network.

Prerequisites

  1. The cluster has been deployed based on the test network diagram.
  2. The test sample tool kal-test package for the algorithm has been obtained. For details about the sample project directory structure, see the README file. This test framework will be used in the test process.
  3. The dataset used by the algorithm has been uploaded to the specified HDFS directory. For details, see Test Dataset.

Test Procedure

  1. Go to the /home/test/boostkit/kal-test directory.
    1
    cd /home/test/boostkit/kal-test
    
  2. View the node names in /etc/hosts. As shown in the following figure, the compute nodes are agent1, agent2, and agent3.
    1
    cat /etc/hosts
    

  3. Based on the compute node names obtained in 2, rename the compute nodes in bin/ml/svm_run.sh as follows.
    1. Open the bin/ml/svm_run.sh file.
      1
      vi bin/ml/svm_run.sh
      
    2. Press i to enter the insert mode. Rename the compute nodes in the red box to agent1, agent2, and agent3. If the number of compute nodes is not 3, add or delete rows accordingly.

    3. Press Esc, type :wq!, and press Enter to save the file and exit.
  4. Create a path for saving the results.
    1
    mkdir logs report
    
  5. Execute the test script, for example, test the algorithm performance in the D10M4096 dataset.
    1
    sh bin/ml/svm_run.sh D10M4096 fit no no 2>&1 | tee -a logs/svm_10M4096_fit.log
    
  6. After the execution is complete, you can view data such as the execution duration and result path in the /home/test/boostkit/kal-test/report/Algorithm name_File write time.yml file. In the command output, costTime indicates the algorithm execution duration and saveDataPath indicates the HDFS path for saving the result.
    cat report/Algorithm name_File write time.yml

Expected Result

  1. The script is executed successfully.
  2. The report/Algorithm name_File write time.yml file is generated and the file contains the result information.

Test Result

  

Remarks

  1. If the directory name or location is different, modify it in the script.
  2. The optimal parameters submitted by Spark may vary in different clusters. You need to search for the optimal parameters. You can modify model parameters in conf/ml/svm/svm.yml and modify Spark running parameters in conf/ml/svm/svm_spark.properties.