Logistic Regression
The Logistic Regression algorithm provides ML classification APIs.
Model API Type |
Function API |
|---|---|
ML classification API |
def fit(dataset: Dataset[_]):LogisticRegressionModel |
def fit(dataset: Dataset[_], paramMap: ParamMap): LogisticRegressionModel |
|
def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*):LogisticRegressionModel |
|
def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[LogisticRegressionModel] |
ML Classification API
- Input and output
- Package name: package org.apache.spark.ml.classification
- Class name: LogisticRegression
- Method name: fit
- Input: training sample data (Dataset[_]). The following are mandatory fields.
Parameter
Value Type
Default Value
Description
labelCol
Double
label
Label. Requirements are as follows:
- label == label.toInt
- label >= 0
featuresCol
Vector
features
Feature label
- Parameters optimized based on native algorithms
def setRegParam(value: Double): LogisticRegression.this.type def setElasticNetParam(value: Double): LogisticRegression.this.type def setMaxIter(value: Int): LogisticRegression.this.type def setTol(value: Double): LogisticRegression.this.type def setFitIntercept(value: Boolean): LogisticRegression.this.type def setFamily(value: String): LogisticRegression.this.type def setStandardization(value: Boolean): LogisticRegression.this.type override def setThreshold(value: Double): LogisticRegression.this.type def setWeightCol(value: String): LogisticRegression.this.type override def setThresholds(value: Array[Double]): LogisticRegression.this.type def setAggregationDepth(value: Int): LogisticRegression.this.type def setLowerBoundsOnCoefficients(value: Matrix): LogisticRegression.this.type def setUpperBoundsOnCoefficients(value: Matrix): LogisticRegression.this.type def setLowerBoundsOnIntercepts(value: Vector): LogisticRegression.this.type def setUpperBoundsOnIntercepts(value: Vector): LogisticRegression.this.type
An example is provided as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
import org.apache.spark.ml.param.{ParamMap, ParamPair} val logR = new LogisticRegression() // Define the def fit(dataset: Dataset[_], paramMap: ParamMap) API parameter. val paramMap = ParamMap(logR.maxIter -> maxIter) .put(logR.regParam, regParam) // Define the def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): API parameter. val paramMaps: Array[ParamMap] = new Array[ParamMap](2) for (i <- 0 to 2) { paramMaps(i) = ParamMap(logR.maxIter -> maxIter) .put(logR.regParam, regParam) }//Assign a value to paramMaps. // Define the def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*) API parameter. val regParamPair = ParamPair(logR.regParam, regParam) val maxIterParamPair = ParamPair(logR.maxIter, maxIter) val tolParamPair = ParamPair(logR.tol, tol) // Call the fit APIs. model = logR.fit(trainingData) model = logR.fit(trainingData, paramMap) models = logR.fit(trainingData, paramMaps) model = logR.fit(trainingData, regParamPair, maxIterParamPair, tolParamPair)
- Output: LogisticRegressionModel. The following table lists the field output in model prediction.
Parameter
Value Type
Default Value
Description
predictionCol
Double
prediction
Predicted label
- Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
import org.apache.spark.ml.classification.LogisticRegression // Load training data val training = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt") val lr = new LogisticRegression() .setMaxIter(10) .setRegParam(0.3) .setElasticNetParam(0.8) // Fit the model val lrModel = lr.fit(training) // Print the coefficients and intercept for logistic regression println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}") // We can also use the multinomial family for binary classification val mlr = new LogisticRegression() .setMaxIter(10) .setRegParam(0.3) .setElasticNetParam(0.8) .setFamily("multinomial") val mlrModel = mlr.fit(training)
Parent topic: Classification and Regression