LogisticRegression

The LogisticRegression algorithm uses ML APIs.

Model API Type	Function API
ML API	def fit(dataset: Dataset[_]):LogisticRegressionModel
	def fit(dataset: Dataset[_], paramMap: ParamMap): LogisticRegressionModel
	def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*):LogisticRegressionModel
	def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[LogisticRegressionModel]

ML classification API

Function
Import sample data in dataset format, call the fit API, and output the LogisticRegression model.

Input/Output

Package name: package org.apache.spark.ml.classification
Class name: LogisticRegression
Method name: fit
Input: training sample data (Dataset[_]). Mandatory fields are as follows:
Parameter

Type

Default Value

Description

labelCol

Double

label

Label, require:

1) label == label.toInt

2) label >= 0

featuresCol

Vector

features

Feature label

Parameter	Type	Default Value	Description
labelCol	Double	label	Label, require: 1) label == label.toInt 2) label >= 0
featuresCol	Vector	features	Feature label

Algorithm parameters

Algorithm Parameter
def setRegParam(value: Double): LogisticRegression.this.type def setElasticNetParam(value: Double): LogisticRegression.this.type def setMaxIter(value: Int): LogisticRegression.this.type def setTol(value: Double): LogisticRegression.this.type def setFitIntercept(value: Boolean): LogisticRegression.this.type def setFamily(value: String): LogisticRegression.this.type def setStandardization(value: Boolean): LogisticRegression.this.type override def setThreshold(value: Double): LogisticRegression.this.type def setWeightCol(value: String): LogisticRegression.this.type override def setThresholds(value: Array[Double]): LogisticRegression.this.type def setAggregationDepth(value: Int): LogisticRegression.this.type def setLowerBoundsOnCoefficients(value: Matrix): LogisticRegression.this.type def setUpperBoundsOnCoefficients(value: Matrix): LogisticRegression.this.type def setLowerBoundsOnIntercepts(value: Vector): LogisticRegression.this.type def setUpperBoundsOnIntercepts(value: Vector): LogisticRegression.this.type

Algorithm Parameter

def setRegParam(value: Double): LogisticRegression.this.type

def setElasticNetParam(value: Double): LogisticRegression.this.type

def setMaxIter(value: Int): LogisticRegression.this.type

def setTol(value: Double): LogisticRegression.this.type

def setFitIntercept(value: Boolean): LogisticRegression.this.type

def setFamily(value: String): LogisticRegression.this.type

def setStandardization(value: Boolean): LogisticRegression.this.type

override def setThreshold(value: Double): LogisticRegression.this.type

def setWeightCol(value: String): LogisticRegression.this.type

override def setThresholds(value: Array[Double]): LogisticRegression.this.type

def setAggregationDepth(value: Int): LogisticRegression.this.type

def setLowerBoundsOnCoefficients(value: Matrix): LogisticRegression.this.type

def setUpperBoundsOnCoefficients(value: Matrix): LogisticRegression.this.type

def setLowerBoundsOnIntercepts(value: Vector): LogisticRegression.this.type

def setUpperBoundsOnIntercepts(value: Vector): LogisticRegression.this.type

An example is provided as follows:

import org.apache.spark.ml.param.{ParamMap, ParamPair}

val logR = new LogisticRegression()
// Define the def fit(dataset: Dataset[_], paramMap: ParamMap) API parameter.
val paramMap = ParamMap(logR.maxIter -> maxIter)
.put(logR.regParam, regParam)

// Define the def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): API parameter.
val paramMaps: Array[ParamMap] = new Array[ParamMap](2)
for (i <- 0 to  2) {
paramMaps(i) = ParamMap(logR.maxIter -> maxIter)
.put(logR.regParam, regParam)
}// Assign a value to paramMaps.

// Define the def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*) API parameter.
val regParamPair = ParamPair(logR.regParam, regParam)
val maxIterParamPair = ParamPair(logR.maxIter, maxIter)
val tolParamPair = ParamPair(logR.tol, tol)

// Call the fit APIs.
model = logR.fit(trainingData)
model = logR.fit(trainingData, paramMap)
models = logR.fit(trainingData, paramMaps)
model = logR.fit(trainingData, regParamPair, maxIterParamPair, tolParamPair)

Output: LogisticRegressionModel. The output in model prediction is as follows.
Parameter

Type

Default Value

Description

predictionCol

Double

prediction

Predicted Label

Parameter	Type	Default Value	Description
predictionCol	Double	prediction	Predicted Label

Sample usage

import org.apache.spark.ml.classification.LogisticRegression

// Load training data
val training = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")

val lr = new LogisticRegression()
  .setMaxIter(10)
  .setRegParam(0.3)
  .setElasticNetParam(0.8)

// Fit the model
val lrModel = lr.fit(training)

// Print the coefficients and intercept for logistic regression
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")

// We can also use the multinomial family for binary classification
val mlr = new LogisticRegression()
  .setMaxIter(10)
  .setRegParam(0.3)
  .setElasticNetParam(0.8)
  .setFamily("multinomial")

val mlrModel = mlr.fit(training)

Parent topic: Developing an Application