Rate This Document
Findability
Accuracy
Completeness
Readability

Classification and Regression

SVM

This part describes the impact of SVM algorithm parameters on the model performance. The default configuration file directory is $KAL_TEST/conf/ml/svm, in which $KAL_TEST/conf/ is the kal-test tool deployment directory.

Parameter

Description

Suggestion

numPartitions

Number of Spark partitions. If the number is large, there is a large number of tasks, and the scheduling time increases. If the number of partitions is too small, tasks may not be allocated to some nodes and the data volume processed by each partition increases. As a result, the memory usage of each agent node increases.

Perform a grid search using 0.5 to 1.5 times of the total number of cores (the product of executor_cores multiplied by num_executor). You are advised to perform the grid search based on the total number of cores.

maxIter

Maximum number of iterations. If the value is too large, the training time is too long and the model may be overfitted, reducing the accuracy. If the value is too small, the model cannot be converged to the optimal value and the accuracy is low.

Search for the parameter within [50, 150]. The default value 100 is recommended. Reduce the number of iterations for a dataset with a small number of features.

inertiaCoefficient

Weight of the historical direction information in momentum calculation. This is a newly added parameter and is set by the spark.boostkit.LinearSVC.inertiaCoefficient parameter. This parameter is a positive real number of the double-precision type and is used to optimize the accuracy.

The default value is 0.5.