SVD
The SVD algorithm provides two types of model APIs: RowMatrix SVD API and IndexedRowMatrix SVD API.
Model API Type |
Function API |
|---|---|
MLlib RowMatrix API |
def computeSVD( k: Int, computeU: Boolean = false, rCond: Double = 1e-9): SingularValueDecomposition[RowMatrix,Matrix] |
MLlib IndexedRowMatrix API |
def computeSVD( k: Int, computeU: Boolean = false, rCond: Double = 1e-9):SingularValueDecomposition[IndexedRowMatrix, Matrix] |
MLlib RowMatrix API
- Input and output
- Package name: package org.apache.spark.mllib.linalg.distributed
- Class name: RowMatrix
- Method name: computeSVD
- Input: matrix (RowMatrix)
Parameter
Value Type
Description
rows
RDD[Vector]
Matrix, which is stored by row
nRows
Long
Number of rows
nCols
Int
Number of columns
- Algorithm parameters
Parameter
Value Type
Default Value
Description
k
Int
-
Number of required singular values. The value range is [1, n].
computeU
Boolean
false
Whether to calculate the left singular matrix.
rCond
Double
1e-9
Reciprocal of the number of matrix conditions. If the parameter value exceeds the value of rCond*s[0], the parameter value is considered 0.
An example is provided as follows:
1 2 3 4
val matrix = new RowMatrix(trainingData, params.numRows, params.numCols) // Row matrix instance // Call the computeSVD API of the row matrix. val svd = matrix.computeSVD(params.k, computeU = true)
- Output: SVD decomposition result SingularValueDecomposition[RowMatrix, Matrix]. SingularValueDecomposition is a case class that contains three variables U, s, and V.
Parameter
Value Type
Description
U
RowMatrix
Left singular matrix with the size of m x k.
s
Vector
Singular value vector with the length of k.
V
Matrix
Right singular matrix with the size of n x k.
- Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
import org.apache.spark.mllib.linalg.Matrix import org.apache.spark.mllib.linalg.SingularValueDecomposition import org.apache.spark.mllib.linalg.Vector import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.linalg.distributed.RowMatrix val data = Array( Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))), Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0), Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0)) val rows = sc.parallelize(data) val mat: RowMatrix = new RowMatrix(rows) // Compute the top 5 singular values and corresponding singular vectors. val svd: SingularValueDecomposition[RowMatrix, Matrix] = mat.computeSVD(5, computeU = true) val U: RowMatrix = svd.U // The U factor is a RowMatrix. val s: Vector = svd.s // The singular values are stored in a local dense vector. val V: Matrix = svd.V // The V factor is a local dense matrix.
MLlib IndexedRowMatrix API
- Input and output
- Package name: package org.apache.spark.mllib.linalg.distributed
- Class name: IndexedRowMatrix
- Method name: computeSVD
- Input: matrix (RowMatrix)
Parameter
Value Type
Description
rows
RDD[IndexedRow]
Matrix, which is stored by row
IndexedRow(index: Long, vector: Vector)
nRows
Long
Number of rows
nCols
Int
Number of columns
- Algorithm parameters
Parameter
Value Type
Default Value
Description
k
Int
-
Number of required singular values. The value range is [1, n].
computeU
Boolean
false
Whether to calculate the left singular matrix.
rCond
Double
1e-9
Reciprocal of the number of matrix conditions. If the parameter value exceeds the value of rCond*s[0], the parameter value is considered 0.
An example is provided as follows:
1 2 3 4
val indexedMatrix = new IndexedRowMatrix(trainingData, params.numRows, params.numCols) // Indexed row matrix instance // Call the computeSVD API of the indexed row matrix. val svd = indexedMatrix.computeSVD(params.k, computeU = true)
- Output: SVD decomposition result (SingularValueDecomposition[IndexedRowMatrix, Matrix])
Parameter
Value Type
Description
U
IndexedRowMatrix
Left singular matrix with the size of m x k
s
Vector
Singular value vector with the length of k.
V
Matrix
Right singular matrix with the size of n x k.
- Example
1val svdRes = distMatrix.computeSVD(k)