LogisticRegression

The LogisticRegression algorithm uses ML APIs.

Model API Type	Function API
ML API	def fit(dataset: Dataset[_]):LogisticRegressionModel
	def fit(dataset: Dataset[_], paramMap: ParamMap): LogisticRegressionModel
	def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*):LogisticRegressionModel
	def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[LogisticRegressionModel]

ML classification API

Function
This type of APIs is used to import sample data in dataset format, call the fit API, and output the LogisticRegression model.

Input and output

Package name: package org.apache.spark.ml.classification
Class name: LogisticRegression
Method name: fit
Input: training sample data (Dataset[_]). The following are mandatory fields.
Param name

Type(s)

Default

Description

labelCol

Double

"label"

Label, require: 1) label == label.toInt 2) label >= 0

featuresCol

Vector

"features"

Feature label

Param name	Type(s)	Default	Description
labelCol	Double	"label"	Label, require: 1) label == label.toInt 2) label >= 0
featuresCol	Vector	"features"	Feature label

Algorithm parameters

Algorithm Parameter
def setRegParam(value: Double): LogisticRegression.this.type def setElasticNetParam(value: Double): LogisticRegression.this.type def setMaxIter(value: Int): LogisticRegression.this.type def setTol(value: Double): LogisticRegression.this.type def setFitIntercept(value: Boolean): LogisticRegression.this.type def setFamily(value: String): LogisticRegression.this.type def setStandardization(value: Boolean): LogisticRegression.this.type override def setThreshold(value: Double): LogisticRegression.this.type def setWeightCol(value: String): LogisticRegression.this.type override def setThresholds(value: Array[Double]): LogisticRegression.this.type def setAggregationDepth(value: Int): LogisticRegression.this.type def setLowerBoundsOnCoefficients(value: Matrix): LogisticRegression.this.type def setUpperBoundsOnCoefficients(value: Matrix): LogisticRegression.this.type def setLowerBoundsOnIntercepts(value: Vector): LogisticRegression.this.type def setUpperBoundsOnIntercepts(value: Vector):LogisticRegression.this.type

Algorithm Parameter

def setRegParam(value: Double): LogisticRegression.this.type

def setElasticNetParam(value: Double): LogisticRegression.this.type

def setMaxIter(value: Int): LogisticRegression.this.type

def setTol(value: Double): LogisticRegression.this.type

def setFitIntercept(value: Boolean): LogisticRegression.this.type

def setFamily(value: String): LogisticRegression.this.type

def setStandardization(value: Boolean): LogisticRegression.this.type

override def setThreshold(value: Double): LogisticRegression.this.type

def setWeightCol(value: String): LogisticRegression.this.type

override def setThresholds(value: Array[Double]): LogisticRegression.this.type

def setAggregationDepth(value: Int): LogisticRegression.this.type

def setLowerBoundsOnCoefficients(value: Matrix): LogisticRegression.this.type

def setUpperBoundsOnCoefficients(value: Matrix): LogisticRegression.this.type

def setLowerBoundsOnIntercepts(value: Vector): LogisticRegression.this.type

def setUpperBoundsOnIntercepts(value: Vector):LogisticRegression.this.type

An example is provided as follows:

import org.apache.spark.ml.param.{ParamMap, ParamPair}

val logR = new LogisticRegression()
// Define the def fit(dataset: Dataset[_], paramMap: ParamMap) API parameter.
val paramMap = ParamMap(logR.maxIter -> maxIter)
.put(logR.regParam, regParam)

// Define the def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): API parameter.
val paramMaps: Array[ParamMap] = new Array[ParamMap](2)
for (i <- 0 to  2) {
paramMaps(i) = ParamMap(logR.maxIter -> maxIter)
.put(logR.regParam, regParam)
}// Assign a value to paramMaps.

// Define the def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*) API parameter.
val regParamPair = ParamPair(logR.regParam, regParam)
val maxIterParamPair = ParamPair(logR.maxIter, maxIter)
val tolParamPair = ParamPair(logR.tol, tol)

// Call the fit APIs.
model = logR.fit(trainingData)
model = logR.fit(trainingData, paramMap)
models = logR.fit(trainingData, paramMaps)
model = logR.fit(trainingData, regParamPair, maxIterParamPair, tolParamPair)

Output: LogisticRegressionModel. The output in model prediction is as follows.
Param name

Type(s)

Default

Description

predictionCol

Double

"prediction"

Predicted Label

Param name	Type(s)	Default	Description
predictionCol	Double	"prediction"	Predicted Label

Sample usage

import org.apache.spark.ml.classification.LogisticRegression

// Load training data
val training = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")

val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)

// Fit the model
val lrModel = lr.fit(training)

// Print the coefficients and intercept for logistic regression
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")

// We can also use the multinomial family for binary classification
val mlr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
.setFamily("multinomial")

val mlrModel = mlr.fit(training)

Parent topic: Algorithm APIs