Rate This Document
Findability
Accuracy
Completeness
Readability

SVD

There are RowMatrix SVD APIs and IndexedRowMatrix SVD APIs for the SVD algorithm.

Model API Type

Function API

MLlib RowMatrix API

def computeSVD(

k: Int,

computeU: Boolean = false,

rCond: Double = 1e-9): SingularValueDecomposition[RowMatrix,Matrix]

MLlib IndexedRowMatrix API

def computeSVD(

k: Int,

computeU: Boolean = false,

rCond: Double = 1e-9):SingularValueDecomposition[IndexedRowMatrix, Matrix]

MLlib RowMatrix API

  • Function

    Input the matrix in the RDD[Vector] form and output its singular value decomposition result.

  • Input and output
    1. Package name: package org.apache.spark.mllib.linalg.distributed
    2. Class name: RowMatrix
    3. Method name: computeSVD
    4. Input: matrix (RowMatrix)

      Parameter

      Type

      Description

      rows

      RDD[Vector]

      Matrix, which is stored by row

      nRows

      Long

      Number of rows

      nCols

      Int

      Number of columns

    5. Algorithm parameters

      Parameter

      Type

      Default Value

      Description

      k

      Int

      -

      Number of singular values. The value ranges from 1 to n.

      computeU

      Boolean

      false

      Whether to calculate the left singular matrix

      rCond

      Double

      1e-9

      Reciprocal of the number of matrix conditions. If the parameter value exceeds the value of rCond*s[0], the parameter value is considered 0.

      An example is provided as follows:

      1
      2
      3
      4
      val matrix = new RowMatrix(trainingData, params.numRows, params.numCols) // Row matrix instance
      
      // Call the computeSVD API of the row matrix.
      val svd = matrix.computeSVD(params.k, computeU = true)
      
    6. Output: SVD decomposition result SingularValueDecomposition[RowMatrix, Matrix]. SingularValueDecomposition is a case class that contains three variables U, s, and V.

      Parameter

      Type

      Description

      U

      RowMatrix

      Left singular matrix. The matrix size is m x k.

      s

      Vector

      Singular value vector. The vector length is k.

      V

      Matrix

      Right singular matrix. The matrix size is n x k.

  • Sample usage
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    import org.apache.spark.mllib.linalg.Matrix
    import org.apache.spark.mllib.linalg.SingularValueDecomposition
    import org.apache.spark.mllib.linalg.Vector
    import org.apache.spark.mllib.linalg.Vectors
    import org.apache.spark.mllib.linalg.distributed.RowMatrix
    
    val data = Array(
      Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))),
      Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
      Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0))
    
    val rows = sc.parallelize(data)
    
    val mat: RowMatrix = new RowMatrix(rows)
    
    // Compute the top 5 singular values and corresponding singular vectors.
    val svd: SingularValueDecomposition[RowMatrix, Matrix] = mat.computeSVD(5, computeU = true)
    val U: RowMatrix = svd.U  // The U factor is a RowMatrix.
    val s: Vector = svd.s     // The singular values are stored in a local dense vector.
    val V: Matrix = svd.V     // The V factor is a local dense matrix.
    

MLlib IndexedRowMatrix API

  • Function

    Input the matrix in the RDD[Vector] form and output its singular value decomposition result.

  • Input and output
    1. Package name: package org.apache.spark.mllib.linalg.distributed
    2. Class name: IndexedRowMatrix
    3. Method name: computeSVD
    4. Input: matrix (RowMatrix)

      Parameter

      Type

      Description

      rows

      RDD[IndexedRow]

      Matrix, which is stored by row

      IndexedRow(index: Long, vector: Vector)

      nRows

      Long

      Number of rows

      nCols

      Int

      Number of columns

    5. Algorithm parameters

      Parameter

      Type

      Default Value

      Description

      k

      Int

      -

      Number of singular values. The value ranges from 1 to n.

      computeU

      Boolean

      false

      Whether to calculate the left singular matrix

      rCond

      Double

      1e-9

      Reciprocal of the number of matrix conditions. If the parameter value exceeds the value of rCond*s[0], the parameter value is considered 0.

      An example is provided as follows:

      1
      2
      3
      4
      val indexedMatrix = new IndexedRowMatrix(trainingData, params.numRows, params.numCols) // Indexed row matrix instance
      
      // Call the computeSVD API of the indexed row matrix.
      val svd = indexedMatrix.computeSVD(params.k, computeU = true)
      
    6. Output: SVD decomposition result SingularValueDecomposition[IndexedRowMatrix, Matrix]

      Parameter

      Type

      Description

      U

      IndexedRowMatrix

      Left singular matrix. The matrix size is m x k.

      s

      Vector

      Singular value vector. The vector length is k.

      V

      Matrix

      Right singular matrix. The matrix size is n x k.

  • Sample usage
    1
    val svdRes = distMatrix.computeSVD(k)