Rate This Document
Findability
Accuracy
Completeness
Readability

SimRank

Model API Type

Function API

computeSimilarity API

def computeSimilarity(dataset: Dataset[_]): SimRankSimilarity

computeSimilarity API

  • Function description

    Compute the similarity between users and between products.

  • Input and output
    1. Package name: package org.apache.spark.ml.recommendation
    2. Class name: SimRank
    3. Method name: computeSimilarity
    4. Input: Dataset[_] that stores user search and product usage statistics. At least two columns are included: users and products.

      Parameter

      Value Type

      Description

      dataset

      Dataset[_]

      Dataset that stores user search and product usage statistics

    5. Parameters optimized based on native algorithms

      Parameter

      Value Type

      Description

      damp

      Double

      Attenuation factor

      numIter

      Int

      Number of iterations

      userCol

      String

      User column

      itemCol

      String

      Product column

      It is recommended that damp be set to 0.6 and numIter to 5. Small attenuation factor may lead to low precision and large attenuation may lead to slow convergence. If the number of iterations is too small, the precision may decrease. If the number of iterations is too large, the computation may take a long time.

      Code API example:

      val simrank = new SimRank().setDamp(0.6).setNumIter(5).setUserCol("user").setItemCol("item")
      val simrankSimilarity = simrank.computeSimilarity(df)
      val userSim = simrankSimilarity.userSimilarity
      val itemSim = simrankSimilarity.itemSimilarity
    6. Output: SimRankSimilarity, including the similarity between users and that between products. It is a case class that contains two variables.

      Parameter

      Value Type

      Description

      userSimilarity

      DataFrame

      Similarity between users

      itemSimilarity

      DataFrame

      Similarity between products

  • Example
    val simrank = new SimRank().setDamp(0.6).setNumIter(5).setUserCol("user").setItemCol("item")
    val simrankSimilarity = simrank.computeSimilarity(df)
    val userSim = simrankSimilarity.userSimilarity
    val itemSim = simrankSimilarity.itemSimilarity