Rate This Document
Findability
Accuracy
Completeness
Readability

Building Adaptation Code for the Machine Learning Algorithm Library

  • The process of building the adaptation code Spark-ml-algo-lib for the machine learning algorithm library is as follows. This section uses Spark 3.3.1 as an example.
  • Perform the following operations in the Linux environment. This section is for reference only.
  1. Download the Spark 3.1.1 source code ZIP file to the /opt/ directory and decompress it. The Spark source code directory is generated.

    Download URL: https://github.com/apache/spark/archive/v3.3.1.zip

    1
    wget https://github.com/apache/spark/archive/v3.3.1.zip
    
  2. Download the Breeze 0.13.1 source code ZIP file to the /opt/ directory and decompress it. The Breeze source code directory is generated.

    Download URL: https://github.com/scalanlp/breeze/archive/releases/v1.0.zip

    1
    wget https://github.com/scalanlp/breeze/archive/releases/v1.0.zip
    
  3. In the /opt/ directory, create a project named Spark-ml-algo-lib with the following directory structure.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    cd /opt/
    
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/classification 
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/feature
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/fpm
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/recommendation 
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/regression 
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/tree/impl
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/tuning
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/clustering 
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/feature
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/fpm 
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/distributed 
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/optimization
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/tree 
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/breeze/numerics 
    
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/classification
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/recommendation
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/regression
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/tree/impl 
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/tuning
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/clustering 
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/feature 
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/fpm 
    
  4. Copy the original files in Spark 3.3.1 and Breeze 1.0 to the Spark-ml-algo-lib directories according to the mapping in Table 1 and Table 2. The following provides two sample commands for copying files to the destination directories.

    Some files need to be renamed after being copied to the destination folders.

    Sample commands:
    1
    2
    cp /opt/spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala /opt/Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala
    cp /opt/breeze-releases-v1.0/math/src/main/scala/breeze/numerics/package.scala /opt/Spark-ml-algo-lib/ml-core/src/main/scala/breeze/numerics/DigammaX.scala
    
    Table 1 Spark files required in the Spark-ml-algo-lib project

    Directory in the Spark-ml-algo-lib Project

    File Name in the Spark-ml-algo-lib Project

    Original Directory in Spark

    Original File Name in Spark

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/classification/

    GBTClassifier.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/classification/

    GBTClassifier.scala

    LinearSVC.scala

    LinearSVC.scala

    RandomForestClassifier.scala

    RandomForestClassifier.scala

    DecisionTreeClassifier.scala

    DecisionTreeClassifier.scala

    FMClassifier.scala

    FMClassifier.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/feature

    IDF.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/feature

    IDF.scala

    Word2Vec.scala

    Word2Vec.scala

    DecisionTreeBucketizer.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/classification

    RandomForestClassifier.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/fpm

    PrefixSpan.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/fpm

    PrefixSpan.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/recommendation/

    ALS.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/recommendation

    ALS.scala

    NMF.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/recommendation

    ALS.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/regression/

    DecisionTreeRegressor.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/regression/

    DecisionTreeRegressor.scala

    GBTRegressor.scala

    GBTRegressor.scala

    FMRegressor.scala

    FMRegressor.scala

    RandomForestRegressor.scala

    RandomForestRegressor.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/tree/impl/

    GradientBoostedTrees.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tree/impl/

    GradientBoostedTrees.scala

    RandomForest.scala

    RandomForest.scala

    RandomForest4GBDTX.scala

    RandomForest.scala

    RandomForestRaw.scala

    RandomForest.scala

    DecisionForest.scala

    RandomForest.scala

    DecisionTreeBucket.scala

    RandomForest.scala

    DecisionTreeMetadata.scala

    DecisionTreeMetadata.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/tree/

    treeParams.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tree/

    treeParams.scala

    treeModels.scala

    treeModels.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/tuning/

    BayesianCrossValidator.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tuning/

    CrossValidator.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/clustering/

    LDA.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/clustering

    LDA.scala

    LDAOptimizer.scala

    LDAOptimizer.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/feature

    IDF.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/feature/

    IDF.scala

    Word2Vec.scala

    Word2Vec.scala

    PCA.scala

    PCA.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/fpm/

    PrefixSpan.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/fpm

    PrefixSpan.scala

    FPGrowth.scala

    FPGrowth.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/distributed/

    RowMatrix.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed

    RowMatrix.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/

    EigenValueDecomposition.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/linalg

    EigenValueDecomposition.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/optimization/

    LBFGSN.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/optimization

    LBFGS.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/tree/

    DecisionTree.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/tree

    DecisionTree.scala

    Spark-ml-algo-lib/ml-core/ src/main/scala/org/apache/spark/ml/tree/

    Node.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tree/

    Node.scala

    Spark-ml-algo-lib/ml-core/ src/main/scala/org/apache/spark/ml/tree/impl

    BaggedPoint.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tree/impl/

    BaggedPoint.scala

    DTFeatureStatsAggregator.scala

    DTStatsAggregator.scala

    GradientBoostedTreesCore.scala

    GradientBoostedTrees.scala

    TreePointX.scala

    TreePoint.scala

    TreePointY.scala

    TreePoint.scala

    Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/clustering/

    LDAUtilsX.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/clustering

    LDAUtils.scala

    OnlineLDAOptimizerXObj.scala

    LDAOptimizer.scala

    Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/feature/

    VocabWord.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/feature

    Word2Vec.scala

    Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/fpm/

    LocalPrefixSpan.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/fpm/

    LocalPrefixSpan.scala

    PrefixSpanBase.scala

    PrefixSpan.scala

    FPGrowthCore.scala

    FPGrowth.scala

    Table 2 Breeze files required in the Spark-ml-algo-lib project

    Directory in the Spark-ml-algo-lib Project

    File Name in the Spark-ml-algo-lib Project

    Original Directory in Breeze

    Original File Name in Breeze

    Spark-ml-algo-lib/ml-core/ src/main/scala/breeze/numerics/

    DigammaX.scala

    breeze-releases-v1.0/math/src/main/scala/breeze/numerics/

    package.scala

  5. Download the patch to the /opt/Spark-ml-algo-lib/ directory. Take Spark 3.3.1 as an example. Integrate the patch of Spark 3.3.1 into Spark-ml-algo-lib to obtain the complete adaptation code Spark-ml-algo-lib of the machine learning algorithm library.
    1
    2
    3
    cd /opt/Spark-ml-algo-lib
    wget https://github.com/kunpengcompute/Spark-ml-algo-lib/releases/download/v3.0.0-spark3.3.1/Spark-ml-algo-lib-Spark3.3.1.patch
    patch -p1 < Spark-ml-algo-lib-Spark3.3.1.patch
    

    The directory structure of the complete adaptation code Spark-ml-algo-lib of the machine learning algorithm library is the same as that in the repository.