SimRank
Model API Type |
Function API |
|---|---|
computeSimilarity API |
def computeSimilarity(dataset: Dataset[_]): SimRankSimilarity |
- Input and output
- Package name: package org.apache.spark.ml.recommendation
- Class name: SimRank
- Method name: computeSimilarity
- Input: Dataset[_] that stores user search and product usage statistics. At least two columns are included: users and products.
Parameter
Value Type
Description
dataset
Dataset[_]
Dataset that stores user search and product usage statistics
- Parameters optimized based on native algorithms
Parameter
Value Type
Description
damp
Double
Attenuation factor
numIter
Int
Number of iterations
userCol
String
User column
itemCol
String
Product column
It is recommended that damp be set to 0.6 and numIter to 5. Small attenuation factor may lead to low precision and large attenuation may lead to slow convergence. If the number of iterations is too small, the precision may decrease. If the number of iterations is too large, the computation may take a long time.
Code API example:
val simrank = new SimRank().setDamp(0.6).setNumIter(5).setUserCol("user").setItemCol("item") val simrankSimilarity = simrank.computeSimilarity(df) val userSim = simrankSimilarity.userSimilarity val itemSim = simrankSimilarity.itemSimilarity - Output: SimRankSimilarity, including the similarity between users and that between products. It is a case class that contains two variables.
Parameter
Value Type
Description
userSimilarity
DataFrame
Similarity between users
itemSimilarity
DataFrame
Similarity between products
- Example
val simrank = new SimRank().setDamp(0.6).setNumIter(5).setUserCol("user").setItemCol("item") val simrankSimilarity = simrank.computeSimilarity(df) val userSim = simrankSimilarity.userSimilarity val itemSim = simrankSimilarity.itemSimilarity
Parent topic: Recommendation and Pattern Mining