Rate This Document
Findability
Accuracy
Completeness
Readability

PrefixSpan

The PrefixSpan algorithm uses MLlib APIs.

Model API Type

Function API

MLlib API

def run[Item, Itemset <: Iterable[Item], Sequence <:

Iterable[Itemset]](data: JavaRDD[Sequence]): PrefixSpanModel[Item] def run[Item](data: RDD[Array[Array[Item]]])(implicit arg0:

ClassTag[Item]): PrefixSpanModel[Item]

MLlib API

  • Function

    Import sequence data in RDD format, set the minimum support level and maximum length of a frequent sequential pattern, and call the run API to output all frequent sequences that meet the conditions.

  • Input and output
    1. Package name: package org.apache.spark.mllib.fpm
    2. Class name: PrefixSpan
    3. Method name: run
    4. Input: full sequence data (JavaRDD[Sequence]/RDD[Array[Array[Item]]])
    5. Algorithm parameters

      Algorithm Parameter

      MaxLocalProjDBSize

      Maximum number of items allowed in a prefix-projected database before local processing

      MaxPatternLength

      Maximum length of a frequent sequential pattern

      MinSupport

      Minimum support level of a frequent sequential pattern

    6. Added algorithm parameters

      Parameter

      spark conf Parameter Name

      Description

      Type

      localTi meout

      spark.sophon.ml.ps .localTimeout

      Timeout interval for local processing, in seconds

      Integer type. The value must be greater than or equal to 0. The default value is 300.

      filterCa ndidate s

      spark.sophon.ml.ps .filterCandidates

      Whether to filter the prefix candidate set

      Boolean type. The default value is false.

      projDBS tep

      spark.sophon.ml.ps .projDBStep

      (Advanced parameter) Adjustment steps of the projection data volume. Retain the default value.

      Double type. The default value is 10.

      An example is provided as follows:

      val prefixSpan = new PrefixSpan()
          .setMinSupport(params.minSupport)
          .setMaxPatternLength(params.maxPatternLength)
          .setMaxLocalProjDBSize(params.maxLocalProjDBSize)
      val model = prefixSpan.run(sequences)
    7. Output: frequent sequence model (PrefixSpanModel[Item])
  • Sample usage
    import org.apache.spark.mllib.fpm.PrefixSpan
    
    val sequences = sc.parallelize(Seq(
    Array(Array(1, 2), Array(3)),
    Array(Array(1), Array(3, 2), Array(1, 2)),
    Array(Array(1, 2), Array(5)),
    Array(Array(6)) ), 2).cache()
    val prefixSpan = new PrefixSpan()
    .setMinSupport(0.5)
    .setMaxPatternLength(5)
    val model = prefixSpan.run(sequences)
    model.freqSequences.collect().foreach { freqSequence =>
    println(
    s"${freqSequence.sequence.map(_.mkString("[", ", ", "]")).mkString("[", ", ", "]")}," +
    s" ${freqSequence.freq}")
    }
  • Sample result
    [[2]], 3
    [[3]], 2
    [[1]], 3
    [[2, 1]], 3
    [[1], [3]], 2