Clustering

DBSCAN

This part describes the impact of DBSCAN algorithm parameters on the model performance. The default configuration file directory is $KAL_TEST/conf/ml/dbscan, in which $KAL_TEST/conf/ is the kal-test tool deployment directory.

Parameter	Description	Suggestion
numPartitions	Number of Spark partitions.	It is recommended that the value of numPartitions be the same as the number of executors. (You can decrease the number of executors and increase the resource configuration of a single executor to improve the performance.)
epsilon	Maximum distance two neighbors can be from one another while still belonging to the same cluster.	The value is greater than 0.0.
minPoints	Minimum number of neighbors of a given point.	Positive integer.
sampleRate	sampleRate indicates the sampling rate of the input data. It is used to divide the space of the full input data based on the sampling data.	The value range is (0.0, 1.0]. The default value is 1.0, indicating that full input data is used by default.

Parent topic: Algorithm Parameter Tuning