DBSCAN

Model API Type	Function API
ML API	def fitPredict(dataset: Dataset[_]): DataFrame

Function
Import sample data in the dataset format, call the fitPredict API, and output the clustering result.

Input/Output

Parameter	Type	Default Value	Description
featuresCol	Vector	features	Feature vector

Output: clustering result. The fields are as follows:

Parameter	Type	Default Value	Description
predictionCol	Int	prediction	Category of the sample. -1: noise sample 0: core sample 1: border sample
labelCol	Int	label	Cluster ID of the sample. For noise samples, the cluster ID is -1 by default. For core/border samples, the cluster ID is greater than or equal to 0.

Parameter

Type

Default Value

Description

predictionCol

Int

prediction

Category of the sample.

-1: noise sample

0: core sample

1: border sample

labelCol

Int

label

Cluster ID of the sample.

For noise samples, the cluster ID is -1 by default.

For core/border samples, the cluster ID is greater than or equal to 0.

Sample usage

val dbscan = new DBSCAN()
      .setEpsilon(0.2)
      .setMinPoints(3)
      .setSampleRate(1.0)
val result = dbscan.fitPredict(trainData)

Parent topic: Developing an Application