KNN
Model API Type |
Function API |
|---|---|
ML API |
def fit(dataset: Dataset[_]): KNNModel def transform(dataset: Dataset[_]): DataFrame |
ML API
- Function
This type of APIs is used to input features in the dataset form and output the k nearest neighbors of each sample.
- Input and output
- Package name: org.apache.spark.ml.neighbors
- Class name: KNN
- Method name: fit/transform
- Input: training sample Dataset[_] and test sample Dataset[_]
Param name
Type(s)
Description
dataset
Dataset[_]
DF that contains sample features
k
Int
Number of nearest neighbors
- Algorithm parameters
- fit parameters
Param name
Type(s)
Default
Description
setFeaturesCol(value :String
String
features
Feature column name of the training dataset
setAuxiliaryCols(valu e:Array[String])
Array[Str ing]
Array.empty[St ring]
Additional column name of the training dataset
- transform parameters
Param name
Type(s)
Default
Description
setFeaturesCol(v alue: String)
String
features
Feature column name of the test dataset
setNeighborsCo l(value:String)
String
neighbors
Additional column name of a neighbor
setDistanceCol( value: String)
String
distances
Neighbor distance column name
setK(value: Int)
Int
1
Number of nearest neighbors
setTestBatchSiz e(value: Int)
Int
1024
Search batch size
An example is provided as follows:
val model = new KNN() .setFeaturesCol(featuresCol) .setAuxiliaryCols(Array("id")) .fit(trainDataDF)
- fit parameters
- Output: k nearest neighbors to the test sample, including the distances and the additional columns of the training sample
Param name
Type(s)
Description
dataset
Dataset[_]
DF with k nearest neighbor distance and additional column
Sample usage
val model = new KNN()
.setFeaturesCol(featuresCol)
.setAuxiliaryCols(Array("id"))
.fit(trainDataDF)
val testResults = model
.setFeaturesCol(featuresCol)
.setNeighborsCol(neighborsCol)
.setDistanceCol(distanceCol)
.setK(k)
.setTestBatchSize(testBatchSize)
.transform(testDataDF)