Rate This Document
Findability
Accuracy
Completeness
Readability

KNN

Model API Type

Function API

ML API

def fit(dataset: Dataset[_]): KNNModel def transform(dataset: Dataset[_]): DataFrame

ML API

  • Function

    This type of APIs is used to input features in the dataset form and output the k nearest neighbors of each sample.

  • Input and output
    1. Package name: org.apache.spark.ml.neighbors
    2. Class name: KNN
    3. Method name: fit/transform
    4. Input: training sample Dataset[_] and test sample Dataset[_]

      Param name

      Type(s)

      Description

      dataset

      Dataset[_]

      DF that contains sample features

      k

      Int

      Number of nearest neighbors

    5. Algorithm parameters
      1. fit parameters

        Param name

        Type(s)

        Default

        Description

        setFeaturesCol(value :String

        String

        features

        Feature column name of the training dataset

        setAuxiliaryCols(valu e:Array[String])

        Array[Str ing]

        Array.empty[St ring]

        Additional column name of the training dataset

      2. transform parameters

        Param name

        Type(s)

        Default

        Description

        setFeaturesCol(v alue: String)

        String

        features

        Feature column name of the test dataset

        setNeighborsCo l(value:String)

        String

        neighbors

        Additional column name of a neighbor

        setDistanceCol( value: String)

        String

        distances

        Neighbor distance column name

        setK(value: Int)

        Int

        1

        Number of nearest neighbors

        setTestBatchSiz e(value: Int)

        Int

        1024

        Search batch size

        An example is provided as follows:

        val model = new KNN()
            .setFeaturesCol(featuresCol)
            .setAuxiliaryCols(Array("id"))
            .fit(trainDataDF)
    6. Output: k nearest neighbors to the test sample, including the distances and the additional columns of the training sample

      Param name

      Type(s)

      Description

      dataset

      Dataset[_]

      DF with k nearest neighbor distance and additional column

Sample usage

val model = new KNN()
    .setFeaturesCol(featuresCol)
    .setAuxiliaryCols(Array("id"))
    .fit(trainDataDF)
val testResults = model
    .setFeaturesCol(featuresCol)
    .setNeighborsCol(neighborsCol)
    .setDistanceCol(distanceCol)
    .setK(k)
    .setTestBatchSize(testBatchSize)
    .transform(testDataDF)