LinearRegression
The LinearRegression algorithm uses ML APIs.
Model API Type |
Function API |
|---|---|
ML API |
def fit(dataset: Dataset[_]):LinearRegressionModel |
def fit(dataset: Dataset[_], paramMap: ParamMap): LinearRegressionModel |
|
def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*):LinearRegressionModel |
|
def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[LinearRegressionModel] |
ML API
- Function
This type of APIs is used to import sample data in dataset format, call the fit API, and output the LinearRegression model.
- Input and output
- Package name: package org.apache.spark.ml.regression
- Class name: LinearRegression
- Method name: fit
- Input: training sample data (Dataset[_]). The following are mandatory fields.
Param name
Type(s)
Default
Description
labelCol
Double
"label"
Label
featuresCol
Vector
"features"
Feature label
- Algorithm parameters
Algorithm Parameter
def setRegParam(value: Double): LinearRegression.this.type
def setFitIntercept(value: Boolean): LinearRegression.this.type
def setStandardization(value: Boolean): LinearRegression.this.type
def setElasticNetParam(value: Double): LinearRegression.this.type
def setMaxIter(value: Int): LinearRegression.this.type
def setTol(value: Double): LinearRegression.this.type
def setWeightCol(value: String): LinearRegression.this.type
def setSolver(value: String): LinearRegression.this.type
def setAggregationDepth(value: Int): LinearRegression.this.type
def setLoss(value: String): LinearRegression.this.type
def setEpsilon(value: Double): LinearRegression.this.type
An example is provided as follows:
import org.apache.spark.ml.param.{ParamMap, ParamPair} val linR = new LinearRegression() // Define the def fit(dataset: Dataset[_], paramMap: ParamMap) API parameter. val paramMap = ParamMap(linR.maxIter -> maxIter) .put(linR.regParam, regParam) // Define the def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): API parameter. val paramMaps: Array[ParamMap] = new Array[ParamMap](2) for (i <- 0 to 2) { paramMaps(i) = ParamMap(linR.maxIter -> maxIter) .put(linR.regParam, regParam) }// Assign a value to paramMaps. // Define the def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*) API parameter. val regParamPair = ParamPair(linR.regParam, regParam) val maxIterParamPair = ParamPair(linR.maxIter, maxIter) val tolParamPair = ParamPair(linR.tol, tol) // Call the fit APIs. model = linR.fit(trainingData) model = linR.fit(trainingData, paramMap) models = linR.fit(trainingData, paramMaps) model = linR.fit(trainingData, regParamPair, maxIterParamPair, tolParamPair) - Output: LinearRegressionModel. The output in model prediction is as follows.
Param name
Type(s)
Default
Description
predictionCol
Int
"prediction"
predictionCol
- Sample usage
import org.apache.spark.ml.regression.LinearRegression // Load training data val training = spark.read.format("libsvm") .load("data/mllib/sample_linear_regression_data.txt") val lr = new LinearRegression() .setMaxIter(10) .setRegParam(0.3) .setElasticNetParam(0.8) // Fit the model val lrModel = lr.fit(training) // Summarize the model over the training set and print out some metrics val trainingSummary = lrModel.summary