Building Adaptation Code for the Machine Learning Algorithm Library
- The process of building the adaptation code Spark-ml-algo-lib for the machine learning algorithm library is as follows. This section uses Spark 3.3.1 as an example.
- Perform the following operations in the Linux environment. This section is for reference only.
- Download the Spark 3.1.1 source code ZIP file to the /opt/ directory and decompress it. The Spark source code directory is generated.
Download URL: https://github.com/apache/spark/archive/v3.3.1.zip
1wget https://github.com/apache/spark/archive/v3.3.1.zip - Download the Breeze 0.13.1 source code ZIP file to the /opt/ directory and decompress it. The Breeze source code directory is generated.
Download URL: https://github.com/scalanlp/breeze/archive/releases/v1.0.zip
1wget https://github.com/scalanlp/breeze/archive/releases/v1.0.zip - In the /opt/ directory, create a project named Spark-ml-algo-lib with the following directory structure.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
cd /opt/ mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/classification mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/feature mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/fpm mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/recommendation mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/regression mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/tree/impl mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/tuning mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/clustering mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/feature mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/fpm mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/distributed mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/optimization mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/tree mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/breeze/numerics mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/classification mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/recommendation mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/regression mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/tree/impl mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/tuning mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/clustering mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/feature mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/fpm
- Copy the original files in Spark 3.3.1 and Breeze 1.0 to the Spark-ml-algo-lib directories according to the mapping in Table 1 and Table 2. The following provides two sample commands for copying files to the destination directories.
Some files need to be renamed after being copied to the destination folders.
Sample commands:1 2
cp /opt/spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala /opt/Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala cp /opt/breeze-releases-v1.0/math/src/main/scala/breeze/numerics/package.scala /opt/Spark-ml-algo-lib/ml-core/src/main/scala/breeze/numerics/DigammaX.scala
Table 1 Spark files required in the Spark-ml-algo-lib project Directory in the Spark-ml-algo-lib Project
File Name in the Spark-ml-algo-lib Project
Original Directory in Spark
Original File Name in Spark
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/classification/
GBTClassifier.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/classification/
GBTClassifier.scala
LinearSVC.scala
LinearSVC.scala
RandomForestClassifier.scala
RandomForestClassifier.scala
DecisionTreeClassifier.scala
DecisionTreeClassifier.scala
FMClassifier.scala
FMClassifier.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/feature
IDF.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/feature
IDF.scala
Word2Vec.scala
Word2Vec.scala
DecisionTreeBucketizer.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/classification
RandomForestClassifier.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/fpm
PrefixSpan.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/fpm
PrefixSpan.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/recommendation/
ALS.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/recommendation
ALS.scala
NMF.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/recommendation
ALS.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/regression/
DecisionTreeRegressor.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/regression/
DecisionTreeRegressor.scala
GBTRegressor.scala
GBTRegressor.scala
FMRegressor.scala
FMRegressor.scala
RandomForestRegressor.scala
RandomForestRegressor.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/tree/impl/
GradientBoostedTrees.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tree/impl/
GradientBoostedTrees.scala
RandomForest.scala
RandomForest.scala
RandomForest4GBDTX.scala
RandomForest.scala
RandomForestRaw.scala
RandomForest.scala
DecisionForest.scala
RandomForest.scala
DecisionTreeBucket.scala
RandomForest.scala
DecisionTreeMetadata.scala
DecisionTreeMetadata.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/tree/
treeParams.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tree/
treeParams.scala
treeModels.scala
treeModels.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/tuning/
BayesianCrossValidator.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tuning/
CrossValidator.scala
Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/clustering/
LDA.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/clustering
LDA.scala
LDAOptimizer.scala
LDAOptimizer.scala
Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/feature
IDF.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/feature/
IDF.scala
Word2Vec.scala
Word2Vec.scala
PCA.scala
PCA.scala
Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/fpm/
PrefixSpan.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/fpm
PrefixSpan.scala
FPGrowth.scala
FPGrowth.scala
Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/distributed/
RowMatrix.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed
RowMatrix.scala
Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/
EigenValueDecomposition.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/linalg
EigenValueDecomposition.scala
Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/optimization/
LBFGSN.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/optimization
LBFGS.scala
Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/tree/
DecisionTree.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/tree
DecisionTree.scala
Spark-ml-algo-lib/ml-core/ src/main/scala/org/apache/spark/ml/tree/
Node.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tree/
Node.scala
Spark-ml-algo-lib/ml-core/ src/main/scala/org/apache/spark/ml/tree/impl
BaggedPoint.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tree/impl/
BaggedPoint.scala
DTFeatureStatsAggregator.scala
DTStatsAggregator.scala
GradientBoostedTreesCore.scala
GradientBoostedTrees.scala
TreePointX.scala
TreePoint.scala
TreePointY.scala
TreePoint.scala
Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/clustering/
LDAUtilsX.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/clustering
LDAUtils.scala
OnlineLDAOptimizerXObj.scala
LDAOptimizer.scala
Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/feature/
VocabWord.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/feature
Word2Vec.scala
Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/fpm/
LocalPrefixSpan.scala
spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/fpm/
LocalPrefixSpan.scala
PrefixSpanBase.scala
PrefixSpan.scala
FPGrowthCore.scala
FPGrowth.scala
Table 2 Breeze files required in the Spark-ml-algo-lib project Directory in the Spark-ml-algo-lib Project
File Name in the Spark-ml-algo-lib Project
Original Directory in Breeze
Original File Name in Breeze
Spark-ml-algo-lib/ml-core/ src/main/scala/breeze/numerics/
DigammaX.scala
breeze-releases-v1.0/math/src/main/scala/breeze/numerics/
package.scala
- Download the patch to the /opt/Spark-ml-algo-lib/ directory. Take Spark 3.3.1 as an example. Integrate the patch of Spark 3.3.1 into Spark-ml-algo-lib to obtain the complete adaptation code Spark-ml-algo-lib of the machine learning algorithm library.
1 2 3
cd /opt/Spark-ml-algo-lib wget https://github.com/kunpengcompute/Spark-ml-algo-lib/releases/download/v3.0.0-spark3.3.1/Spark-ml-algo-lib-Spark3.3.1.patch patch -p1 < Spark-ml-algo-lib-Spark3.3.1.patch
The directory structure of the complete adaptation code Spark-ml-algo-lib of the machine learning algorithm library is the same as that in the repository.