Rate This Document
Findability
Accuracy
Completeness
Readability

SVD

There are RowMatrix SVD APIs and IndexedRowMatrix SVD APIs for the SVD algorithm.

Model API Type

Function API

MLlib RowMatrix API

def computeSVD(

k: Int,

computeU: Boolean = false, rCond: Double = 1e-9):

SingularValueDecomposition[RowMatrix,Matrix]

MLlib

IndexedRowMatrix

API

def computeSVD(

k: Int,

computeU: Boolean = false, rCond: Double =

1e-9):SingularValueDecomposition[IndexedRowMatrix, Matrix]

MLlib RowMatrix API

  • Function

    This type of APIs is used to input the matrix in the RDD[Vector] form and output its singular value decomposition result.

  • Input and output
    1. Package name: package org.apache.spark.mllib.linalg.distributed
    2. Class name: RowMatrix
    3. Method name: computeSVD
    4. Input: matrix (RowMatrix)

      Param name

      Type(s)

      Description

      rows

      RDD[Vector]

      Matrix, which is stored by row

      nRows

      Long

      Number of rows

      nCols

      Int

      Number of columns

    5. Algorithm parameters

      Param name

      Type(s)

      Default

      Description

      k

      Int

      -

      Number of singular values. The value ranges from 1 to n.

      computeU

      Boolean

      false

      Whether to calculate the left singular matrix

      rCond

      Double

      1e-9

      Reciprocal of the number of matrix conditions. If the parameter value exceeds the value of rCond*s[0], the parameter value is considered 0.

      An example is provided as follows:

      val matrix = new RowMatrix(trainingData, params.numRows, params.numCols) // Row matrix instance
      
      // Call the computeSVD API of the row matrix.
      val svd = matrix.computeSVD(params.k, computeU = true)
    6. Output: SVD decomposition result SingularValueDecomposition[RowMatrix, Matrix]. SingularValueDecomposition is a case class that contains three variables U, s, and V.

      Param name

      Type(s)

      Description

      U

      RowMatrix

      Left singular matrix. The matrix size is m x k.

      s

      Vector

      Singular value vector. The vector length is k.

      V

      Matrix

      Right singular matrix. The matrix size is n x k.

  • Sample usage
    import org.apache.spark.mllib.linalg.Matrix
    import org.apache.spark.mllib.linalg.SingularValueDecomposition
    import org.apache.spark.mllib.linalg.Vector
    import org.apache.spark.mllib.linalg.Vectors
    import org.apache.spark.mllib.linalg.distributed.RowMatrix
    
    val data = Array(
    Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))),
    Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
    Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0))
    val rows = sc.parallelize(data)
    
    val mat: RowMatrix = new RowMatrix(rows)
    
    // Compute the top 5 singular values and corresponding singular vectors.
    val svd: SingularValueDecomposition[RowMatrix, Matrix] = mat.computeSVD(5, computeU = true)
    val U: RowMatrix = svd.U  // The U factor is a RowMatrix.
    val s: Vector = svd.s     // The singular values are stored in a local dense vector.
    val V: Matrix = svd.V     // The V factor is a local dense matrix.

MLlib IndexedRowMatrix API

  • Function

    This type of APIs is used to input the matrix in the RDD[Vector] form and output its singular value decomposition result.

  • Input and output
    1. Package name: package org.apache.spark.mllib.linalg.distributed
    2. Class name: IndexedRowMatrix
    3. Method name: computeSVD
    4. Input: matrix (RowMatrix)

      Param name

      Type(s)

      Description

      rows

      RDD[IndexedRow]

      Matrix, which is stored by row

      IndexedRow(index:Long, vector: Vector)

      nRows

      Long

      Number of rows

      nCols

      Int

      Number of columns

    5. Algorithm parameters

      Param name

      Type(s)

      Default

      Description

      k

      Int

      -

      Number of singular values. The value ranges from 1 to n.

      computeU

      Boolean

      false

      Whether to calculate the left singular matrix

      rCond

      Double

      1e-9

      Reciprocal of the number of matrix conditions. If the parameter value exceeds the value of rCond*s[0], the parameter value is considered 0.

      An example is provided as follows:

      val indexedMatrix = new IndexedRowMatrix(trainingData, params.numRows, params.numCols) // Indexed row matrix instance
      
      // Call the computeSVD API of the indexed row matrix.
      val svd = indexedMatrix.computeSVD(params.k, computeU = true)
    6. Output: SVD decomposition result SingularValueDecomposition[IndexedRowMatrix, Matrix]

      Param name

      Type(s)

      Description

      U

      IndexedRowMatrix

      Left singular matrix. The matrix size is m x k.

      s

      Vector

      Singular value vector. The vector length is k.

      V

      Matrix

      Right singular matrix. The matrix size is n x k.

  • Sample usage
    val svdRes = distMatrix.computeSVD(k)