Kmeans为ML API。
模型接口类别 |
函数接口 |
---|---|
ML API |
def fit(dataset: Dataset[_]): KMeansModel |
def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[KMeansModel] |
|
def fit(dataset: Dataset[_], paramMap: ParamMap): KMeansModel |
|
def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): KMeansModel |
Param name |
Type(s) |
Default |
Description |
---|---|---|---|
featuresCol |
Vector |
"features" |
特征标签 |
算法参数 |
---|
def setFeaturesCol(value: String): KMeans.this.type def setPredictionCol(value: String): KMeans.this.type def setK(value: Int): KMeans.this.type def setInitMode(value: String): KMeans.this.type def setInitSteps(value: Int): KMeans.this.type def setMaxIter(value: Int): KMeans.this.type def setThreshold(value: Double): KMeans.this.type def setTol(value: Double): KMeans.this.type def setSeed(value: Long): KMeans.this.type |
参数名称 |
参数含义 |
取值类型 |
---|---|---|
sampleRate |
每一轮迭代使用的数据占全量数据集的比例 |
0~1[Double] |
optMethod |
样本数据采样触发开关 |
default/allData[String] |
参数及fit代码接口示例:
import org.apache.spark.ml.param.{ParamMap, ParamPair} val kmeans = new MlKMeans() //定义def fit(dataset: Dataset[_], paramMap: ParamMap) 接口参数 val paramMap = ParamMap(kmeans.initSteps -> initSteps) .put(kmeans.maxIter, maxIter) // 定义def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): 接口参数 val paramMaps: Array[ParamMap] = new Array[ParamMap](2) for (i <- 0 to 2) { paramMaps(i) = ParamMap(kmeans.initSteps -> initSteps) .put(kmeans.maxIter, maxIter) }//对paramMaps进行赋值 // 定义def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*) 接口参数 val initStepsParamPair = ParamPair(kmeans.initSteps, initSteps) val maxIterParamPair = ParamPair(kmeans.maxIter, maxIter) val tolParamPair = ParamPair(kmeans.tol, tol) // 调用各个fit接口 model = kmeans.fit(trainingData) model = kmeans.fit(trainingData, paramMap) models = kemans.fit(trainingData, paramMaps) model = kemans.fit(trainingData, initStepsParamPair, maxIterParamPair, tolParamPair)
Param name |
Type(s) |
Default |
Description |
---|---|---|---|
predictionCol |
Int |
"prediction" |
predictionCol |
import org.apache.spark.ml.clustering.KMeans import org.apache.spark.ml.evaluation.ClusteringEvaluator // Loads data. val dataset = spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt") // Trains a k-means model. val kmeans = new KMeans().setK(2).setSeed(1L) val model = kmeans.fit(dataset) // Make predictions val predictions = model.transform(dataset) // Evaluate clustering by computing Silhouette score val evaluator = new ClusteringEvaluator() val silhouette = evaluator.evaluate(predictions) println(s"Silhouette with squared euclidean distance = $silhouette") // Shows the result. println("Cluster Centers: ") model.clusterCenters.foreach(println)