我要评分
获取效率
正确性
完整性
易理解

DBSCAN

Model API Type

Function API

ML API

def fitPredict(dataset: Dataset[_]): DataFrame

ML API

  • Function

    Import sample data in the dataset format, call the fitPredict API, and output the clustering result.

  • Input/Output
    1. Package name: org.apache.spark.ml.clustering
    2. Class name: DBSCAN
    3. Method name: fitPredict
    4. Input: training sample data (Dataset[_]). The following are mandatory fields.

      Parameter

      Type

      Default Value

      Description

      featuresCol

      Vector

      features

      Feature vector

    5. Algorithm parameters
      def setMinPoints(value: Int): DBSCAN.this.type
      def setEpsilon(value: Double): DBSCAN.this.type
      def setSampleRate(value: Double): DBSCAN.this.type
      1. epsilon indicates the maximum distance two neighbors can be from one another while still belonging to the same cluster. Its value must be greater than 0.0.
      2. minPoints indicates the minimum number of neighbors of a given point. Its value must be greater than 1.
      3. sampleRate indicates the sampling rate of the input data. It is used to divide the space of the full input data based on the sampled data. The value range is (0.0, 1.0]. The default value is 1.0, indicating that the full input data is used by default.

        Code interface example:

        1
        2
        3
        4
         val model = new DBSCAN()
              .setEpsilon(params.epsilon)
              .setMinPoints(params.minPoints)
              .setSampleRate(params.sampleRate)
        
    6. Output: clustering result. The fields are as follows:

      Parameter

      Type

      Default Value

      Description

      predictionCol

      Int

      prediction

      Category of the sample.

      -1: noise sample

      0: core sample

      1: border sample

      labelCol

      Int

      label

      Cluster ID of the sample.

      For noise samples, the cluster ID is -1 by default.

      For core/border samples, the cluster ID is greater than or equal to 0.

  • Sample usage
    val dbscan = new DBSCAN()
          .setEpsilon(0.2)
          .setMinPoints(3)
          .setSampleRate(1.0)
    val result = dbscan.fitPredict(trainData)