Rate This Document
Findability
Accuracy
Completeness
Readability

Executing a Task

On the client, download and extract the dataset corresponding to the sample code in Developing an Application to the /tmp/data/epsilon directory and execute the task. The procedure is as follows:

  1. Go to the /tmp/data/epsilon directory.
    1
    cd /tmp/data/epsilon
    
  2. Download the training set.
    1
    wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.bz2
    

  3. Download the test set.
    1
    wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.t.bz2
    

  4. Decompress the training set and test set to the current directory.
    1
    2
    bzip2 -d epsilon_normalized.bz2
    bzip2 -d epsilon_normalized.t.bz2
    
  5. Upload the training set and test set to the HDFS.
    1
    2
    hadoop fs -put /tmp/data/epsilon/epsilon_normalized  /tmp/data/epsilon/
    hadoop fs -put /tmp/data/epsilon/epsilon_normalized.t  /tmp/data/epsilon/
    
  6. Save the kal_examples_2.11-0.1.jar and run_gbdt.sh files generated in Developing an Application to the /home/test/boostkit/ directory on the client as described in Cluster Environment.

    The content of run_gbdt.sh is as follows:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    spark-submit \
    --class com.bigdata.examples.GBDTRunner \
    --driver-class-path "./lib/*" \
    --jars "lib/fastutil-8.3.1.jar,lib/boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar,lib/boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar,lib/boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar" \
    --conf "spark.executor.extraClassPath=fastutil-8.3.1.jar:boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar" \
    --master yarn \
    --deploy-mode client \
    --driver-cores 40 \
    --driver-memory 50g \
    --executor-cores 19 --num-executors 12 --executor-memory 77g \
    --numPartitions 228 \
    ./kal_examples_2.11-0.1.jar
    
  7. Execute the task.
    1
    sh run_gbdt.sh
    

    View the print result.

    In the result, there are 100 iterations in total and 100 subtrees generated. Subtrees 0, 1, and 99 are selected for display.

    Test Error = 0.1687594970527827  // Predicted classification errors
    Learned classification GBT model:
    GBTClassificationModel (uid=gbtc_dbb4de23ca65) with 100 trees
    Tree 0 (weight 1.0):  //Weight of each subtree
    If (feature 818 <= 0.0028371200000000003) // The split point of the 818th dimension feature is 0.002831200000000003.
    If (feature 1866 <= 0.0064008599999999995)
    If (feature 315 <= -0.0067819)
    If (feature 789 <= -0.0100215)
    If (feature 936 <= 0.0018099549999999998)
    Predict: -0.21098494850805388 // Prediction result of subtree 0
    Else (feature 936 > 0.0018099549999999998)
    Predict: 0.15191210648637227
    Else (feature 789 > -0.0100215)
    If (feature 936 <= -4.79549E-4)
    Predict: 0.17726731948384736
    Else (feature 936 > -4.79549E-4)
    Predict: 0.49173760640961445
    Else (feature 315 > -0.0067819)
    If (feature 789 <= -0.0100215)
    If (feature 936 <= 0.00412001)
    Predict: -0.5011764705882353
    Else (feature 936 > 0.00412001)
    Predict: -0.21027097384924862
    Else (feature 789 > -0.0100215)
    If (feature 649 <= -0.008577419999999999)
    Predict: -0.20268122451800152
    Else (feature 649 > -0.008577419999999999)
    Predict: 0.12942312334057446
    Else (feature 1866 > 0.0064008599999999995)
    If (feature 1697 <= 0.02054865)
    If (feature 649 <= -0.00122175)
    If (feature 315 <= 1.620675E-4)
    Predict: 0.31944546321425954
    Else (feature 315 > 1.620675E-4)
    Predict: -0.014997100008285691
    Else (feature 649 > -0.00122175)
    If (feature 315 <= 0.01095895)
    Predict: 0.5328728914862532
    Else (feature 315 > 0.01095895)
    Predict: 0.2712697181277476
    Else (feature 1697 > 0.02054865)
    If (feature 649 <= -0.008577419999999999)
    If (feature 315 <= 1.620675E-4)
    Predict: 0.5659284497444633
    Else (feature 315 > 1.620675E-4)
    Predict: 0.29297616536595655
    Else (feature 649 > -0.008577419999999999)
    If (feature 1519 <= -0.0024157199999999997)
    Predict: 0.5493390716261912
    Else (feature 1519 > -0.0024157199999999997)
    Predict: 0.7277585664885257
    Else (feature 818 > 0.0028371200000000003)
    If (feature 1866 <= 0.008616374999999999)
    If (feature 789 <= -0.0100215)
    If (feature 1794 <= -0.015113399999999999)
    If (feature 315 <= -0.009021685)
    Predict: -0.14581734458940906
    Else (feature 315 > -0.009021685)
    Predict: -0.43055000665867627
    Else (feature 1794 > -0.015113399999999999)
    If (feature 649 <= -0.008577419999999999)
    Predict: -0.6742799137165334
    Else (feature 649 > -0.008577419999999999)
    Predict: -0.466970082323807
    Else (feature 789 > -0.0100215)
    If (feature 755 <= 0.00919656)
    If (feature 315 <= -0.002158415)
    Predict: 0.04372444164831708
    Else (feature 315 > -0.002158415)
    Predict: -0.2799099183635169
    Else (feature 755 > 0.00919656)
    If (feature 1697 <= 0.0110916)
    Predict: -0.5381727158948686
    Else (feature 1697 > 0.0110916)
    Predict: -0.2597821083320546
    Else (feature 1866 > 0.008616374999999999)
    If (feature 789 <= -0.0144677)
    If (feature 315 <= -0.00447581)
    If (feature 755 <= -0.0100993)
    Predict: 0.22025316455696203
    Else (feature 755 > -0.0100993)
    Predict: -0.1234739607479524
    Else (feature 315 > -0.00447581)
    If (feature 936 <= 0.0018099549999999998)
    Predict: -0.517205957883924
    Else (feature 936 > 0.0018099549999999998)
    Predict: -0.18999735379730087
    Else (feature 789 > -0.0144677)
    If (feature 315 <= 0.002566035)
    If (feature 649 <= -0.01302385)
    Predict: 0.11884615384615385
    Else (feature 649 > -0.01302385)
    Predict: 0.43213844252163164
    Else (feature 315 > 0.002566035)
    If (feature 936 <= 0.0018099549999999998)
    Predict: -0.1847658260422028
    Else (feature 936 > 0.0018099549999999998)
    Predict: 0.14976641934597418
    Tree 1 (weight 0.1):
    If (feature 1389 <= -0.0025451)
    If (feature 994 <= -0.00340289)
    If (feature 1814 <= -4.766875E-4)
    .....
    Tree 99 (weight 0.1):
    If (feature 539 <= 0.0201035)
    If (feature 994 <= -0.0271962)
    If (feature 738 <= 0.020128649999999998)
    If (feature 1678 <= 0.001967885)
    If (feature 830 <= -0.01119755)
    Predict: -0.10394440691166018
    Else (feature 830 > -0.01119755)
    Predict: 0.06684711260628788
    Else (feature 1678 > 0.001967885)
    If (feature 958 <= -0.01238975)
    Predict: -0.05886208031154687
    Else (feature 958 > -0.01238975)
    Predict: 0.19195532805094076
    ......