PCA
The PCA algorithm provides ML APIs.
Model API Type |
Function API |
|---|---|
ML API |
def fit(dataset: Dataset[_]): PCAModel |
def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[PCAModel] |
|
def fit(dataset: Dataset[_], paramMap: ParamMap): PCAModel |
|
def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): PCAModel |
ML API
- Input and output
- Package name: org.apache.spark.ml.feature
- Class name: PCA
- Method name: fit
- Input: matrix (Dataset[_]) and the number of principal components
Parameter
Value Type
Description
dataset
Dataset[Vector]
Matrix, which is stored by row
k
Int
Number of principal components
- Algorithm parameters
Parameter
Value Type
Default Value
Description
setk(value:Int)
k
-
Number of required principal components. The value range is [1, n].
An example is provided as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
import org.apache.spark.ml.param.{ParamMap, ParamPair} val pca = new MLPCA() // Define the def fit(dataset: Dataset[_], paramMap: ParamMap) API parameter. val paramMap = ParamMap(pca.k -> params.k) .put(pca.inputCol, "matrix") // Define the def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): API parameter. val paramMaps: Array[ParamMap] = new Array[ParamMap](2) for (i <- 0 to 2) { paramMaps(i) = ParamMap(pca.k -> params.k) .put(pca.inputCol, "matrix") }//Assign a value to paramMaps. // Define the def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*) API parameter. val kParamPair = ParamPair(pca.k,k) // Call the fit APIs. model = pca.fit(trainingData) model = pca.fit(trainingData, paramMap) models = pca.fit(trainingData, paramMaps) model = pca.fit(trainingData, kParamPair)
- Output: PCAModel, including the principal components and the corresponding weights
Parameter
Value Type
Description
pc
DenseMatrix
Principal component matrix. Each column is a principal component vector.
explainedVariance
DenseVector
Weights of the principal components. Each dimension corresponds to a principal component.
- Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
import org.apache.spark.ml.feature.PCA import org.apache.spark.ml.linalg.Vectors val data = Array( Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))), Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0), Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0) ) val df = spark.createDataFrame(data.map(Tuple1.apply)).toDF("features") val pca = new PCA() .setInputCol("features") .setOutputCol("pcaFeatures") .setK(3) .fit(df) val result = pca.transform(df).select("pcaFeatures")