PrefixSpan
The PrefixSpan algorithm uses MLlib APIs.
Model API Type |
Function API |
|---|---|
MLlib API |
def run[Item, Itemset <: Iterable[Item], Sequence <: Iterable[Itemset]](data: JavaRDD[Sequence]): PrefixSpanModel[Item] def run[Item](data: RDD[Array[Array[Item]]])(implicit arg0: ClassTag[Item]): PrefixSpanModel[Item] |
MLlib API
- Function
Import sequence data in RDD format, set the minimum support level and maximum length of a frequent sequential pattern, and call the run API to output all frequent sequences that meet the conditions.
- Input and output
- Package name: package org.apache.spark.mllib.fpm
- Class name: PrefixSpan
- Method name: run
- Input: full sequence data (JavaRDD[Sequence]/RDD[Array[Array[Item]]])
- Algorithm parameters
Algorithm Parameter
MaxLocalProjDBSize
Maximum number of items allowed in a prefix-projected database before local processing
MaxPatternLength
Maximum length of a frequent sequential pattern
MinSupport
Minimum support level of a frequent sequential pattern
- Added algorithm parameters
Parameter
spark conf Parameter Name
Description
Type
localTi meout
spark.sophon.ml.ps .localTimeout
Timeout interval for local processing, in seconds
Integer type. The value must be greater than or equal to 0. The default value is 300.
filterCa ndidate s
spark.sophon.ml.ps .filterCandidates
Whether to filter the prefix candidate set
Boolean type. The default value is false.
projDBS tep
spark.sophon.ml.ps .projDBStep
(Advanced parameter) Adjustment steps of the projection data volume. Retain the default value.
Double type. The default value is 10.
An example is provided as follows:
val prefixSpan = new PrefixSpan() .setMinSupport(params.minSupport) .setMaxPatternLength(params.maxPatternLength) .setMaxLocalProjDBSize(params.maxLocalProjDBSize) val model = prefixSpan.run(sequences) - Output: frequent sequence model (PrefixSpanModel[Item])
- Sample usage
import org.apache.spark.mllib.fpm.PrefixSpan val sequences = sc.parallelize(Seq( Array(Array(1, 2), Array(3)), Array(Array(1), Array(3, 2), Array(1, 2)), Array(Array(1, 2), Array(5)), Array(Array(6)) ), 2).cache() val prefixSpan = new PrefixSpan() .setMinSupport(0.5) .setMaxPatternLength(5) val model = prefixSpan.run(sequences) model.freqSequences.collect().foreach { freqSequence => println( s"${freqSequence.sequence.map(_.mkString("[", ", ", "]")).mkString("[", ", ", "]")}," + s" ${freqSequence.freq}") } - Sample result
[[2]], 3 [[3]], 2 [[1]], 3 [[2, 1]], 3 [[1], [3]], 2