Rate This Document
Findability
Accuracy
Completeness
Readability

Logistic Regression

The Logistic Regression algorithm provides ML classification APIs.

Model API Type

Function API

ML classification API

def fit(dataset: Dataset[_]):LogisticRegressionModel

def fit(dataset: Dataset[_], paramMap: ParamMap): LogisticRegressionModel

def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*):LogisticRegressionModel

def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[LogisticRegressionModel]

ML Classification API

  • Function description

    Output the Logistic Regression model after you input sample data in dataset format and call the fit API.

  • Input and output
    1. Package name: package org.apache.spark.ml.classification
    2. Class name: LogisticRegression
    3. Method name: fit
    4. Input: training sample data (Dataset[_]). The following are mandatory fields.

      Parameter

      Value Type

      Default Value

      Description

      labelCol

      Double

      label

      Label. Requirements are as follows:

      • label == label.toInt
      • label >= 0

      featuresCol

      Vector

      features

      Feature label

    5. Parameters optimized based on native algorithms
      def setRegParam(value: Double): LogisticRegression.this.type
      def setElasticNetParam(value: Double): LogisticRegression.this.type
      def setMaxIter(value: Int): LogisticRegression.this.type
      def setTol(value: Double): LogisticRegression.this.type
      def setFitIntercept(value: Boolean): LogisticRegression.this.type
      def setFamily(value: String): LogisticRegression.this.type
      def setStandardization(value: Boolean): LogisticRegression.this.type
      override def setThreshold(value: Double): LogisticRegression.this.type
      def setWeightCol(value: String): LogisticRegression.this.type
      override def setThresholds(value: Array[Double]): LogisticRegression.this.type
      def setAggregationDepth(value: Int): LogisticRegression.this.type
      def setLowerBoundsOnCoefficients(value: Matrix): LogisticRegression.this.type
      def setUpperBoundsOnCoefficients(value: Matrix): LogisticRegression.this.type
      def setLowerBoundsOnIntercepts(value: Vector): LogisticRegression.this.type
      def setUpperBoundsOnIntercepts(value: Vector): LogisticRegression.this.type

      An example is provided as follows:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      import org.apache.spark.ml.param.{ParamMap, ParamPair}
      
      val logR = new LogisticRegression()
      // Define the def fit(dataset: Dataset[_], paramMap: ParamMap) API parameter.
      val paramMap = ParamMap(logR.maxIter -> maxIter)
      .put(logR.regParam, regParam)
      
      // Define the def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): API parameter.
      val paramMaps: Array[ParamMap] = new Array[ParamMap](2)
      for (i <- 0 to  2) {
      paramMaps(i) = ParamMap(logR.maxIter -> maxIter)
      .put(logR.regParam, regParam)
      }//Assign a value to paramMaps.
      
      // Define the def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*) API parameter.
      val regParamPair = ParamPair(logR.regParam, regParam)
      val maxIterParamPair = ParamPair(logR.maxIter, maxIter)
      val tolParamPair = ParamPair(logR.tol, tol)
      
      // Call the fit APIs.
      model = logR.fit(trainingData)
      model = logR.fit(trainingData, paramMap)
      models = logR.fit(trainingData, paramMaps)
      model = logR.fit(trainingData, regParamPair, maxIterParamPair, tolParamPair)
      
    6. Output: LogisticRegressionModel. The following table lists the field output in model prediction.

      Parameter

      Value Type

      Default Value

      Description

      predictionCol

      Double

      prediction

      Predicted label

  • Example
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    import org.apache.spark.ml.classification.LogisticRegression
    
    // Load training data
    val training = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
    
    val lr = new LogisticRegression()
      .setMaxIter(10)
      .setRegParam(0.3)
      .setElasticNetParam(0.8)
    
    // Fit the model
    val lrModel = lr.fit(training)
    
    // Print the coefficients and intercept for logistic regression
    println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")
    
    // We can also use the multinomial family for binary classification
    val mlr = new LogisticRegression()
      .setMaxIter(10)
      .setRegParam(0.3)
      .setElasticNetParam(0.8)
      .setFamily("multinomial")
    
    val mlrModel = mlr.fit(training)