我要评分
获取效率
正确性
完整性
易理解

Executing the Task and Viewing the Result

On the client, download and extract the dataset corresponding to the example code in Developing an Application to the /tmp/data/epsilon directory and execute the task.

  1. Go to the /tmp/data/epsilon directory.
    1
    cd /tmp/data/epsilon
    
  2. Download the training dataset.
    1
    wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.bz2
    
  3. Download the test dataset.
    1
    wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.t.bz2
    
  4. Decompress the training dataset and test dataset to the current directory.
    1
    2
    bzip2 -d epsilon_normalized.bz2
    bzip2 -d epsilon_normalized.t.bz2
    
  5. Upload the training dataset and test dataset to the HDFS.
    1
    2
    hadoop fs -put /tmp/data/epsilon/epsilon_normalized  /tmp/data/epsilon/
    hadoop fs -put /tmp/data/epsilon/epsilon_normalized.t  /tmp/data/epsilon/
    
  6. Save the kal_examples_2.11-0.1.jar and run_gbdt.sh files generated in Developing an Application to the /home/test/boostkit/ directory of the client.

    The content of run_gbdt.sh is as follows:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    spark-submit \
    --class com.bigdata.examples.GBDTRunner \
    --driver-class-path "./lib/*" \
    --jars "lib/fastutil-8.3.1.jar,lib/boostkit-ml-acc_2.11-2.2.0-spark2.3.2.jar,lib/boostkit-ml-core_2.11-2.2.0-spark2.3.2.jar,lib/boostkit-ml-kernel_2.11-2.2.0-spark2.3.2-aarch64.jar" \
    --conf "spark.executor.extraClassPath=fastutil-8.3.1.jar:boostkit-ml-acc_2.11-2.2.0-spark2.3.2.jar:boostkit-ml-core_2.11-2.2.0-spark2.3.2.jar:boostkit-ml-kernel_2.11-2.2.0-spark2.3.2-aarch64.jar" \
    --master yarn \
    --deploy-mode client \
    --driver-cores 40 \
    --driver-memory 50g \
    --executor-cores 19 --num-executors 12 --executor-memory 77g \
    --numPartitions 228 \
    ./kal_examples_2.11-0.1.jar
    
  7. Execute the task.
    1
    sh run_gbdt.sh
    

    View the print result.

    In the result, there are 100 iterations in total and 100 subtrees generated. Subtrees 0, 1, and 99 are selected for display.

    Test Error = 0.1687594970527827 // Predicted classification error
    Learned classification GBT model:
    GBTClassificationModel (uid=gbtc_dbb4de23ca65) with 100 trees
    Tree 0 (weight 1.0):  // Weight of each subtree
    If (feature 818 <= 0.0028371200000000003) // The split point of the 818th dimension feature is 0.002831200000000003.
    If (feature 1866 <= 0.0064008599999999995)
    If (feature 315 <= -0.0067819)
    If (feature 789 <= -0.0100215)
    If (feature 936 <= 0.0018099549999999998)
    Predict: -0.21098494850805388 // Prediction result of subtree 0
    Else (feature 936 > 0.0018099549999999998)
    Predict: 0.15191210648637227
    Else (feature 789 > -0.0100215)
    If (feature 936 <= -4.79549E-4)
    Predict: 0.17726731948384736
    Else (feature 936 > -4.79549E-4)
    Predict: 0.49173760640961445
    Else (feature 315 > -0.0067819)
    If (feature 789 <= -0.0100215)
    If (feature 936 <= 0.00412001)
    Predict: -0.5011764705882353
    Else (feature 936 > 0.00412001)
    Predict: -0.21027097384924862
    Else (feature 789 > -0.0100215)
    If (feature 649 <= -0.008577419999999999)
    Predict: -0.20268122451800152
    Else (feature 649 > -0.008577419999999999)
    Predict: 0.12942312334057446
    Else (feature 1866 > 0.0064008599999999995)
    If (feature 1697 <= 0.02054865)
    If (feature 649 <= -0.00122175)
    If (feature 315 <= 1.620675E-4)
    Predict: 0.31944546321425954
    Else (feature 315 > 1.620675E-4)
    Predict: -0.014997100008285691
    Else (feature 649 > -0.00122175)
    If (feature 315 <= 0.01095895)
    Predict: 0.5328728914862532
    Else (feature 315 > 0.01095895)
    Predict: 0.2712697181277476
    Else (feature 1697 > 0.02054865)
    If (feature 649 <= -0.008577419999999999)
    If (feature 315 <= 1.620675E-4)
    Predict: 0.5659284497444633
    Else (feature 315 > 1.620675E-4)
    Predict: 0.29297616536595655
    Else (feature 649 > -0.008577419999999999)
    If (feature 1519 <= -0.0024157199999999997)
    Predict: 0.5493390716261912
    Else (feature 1519 > -0.0024157199999999997)
    Predict: 0.7277585664885257
    Else (feature 818 > 0.0028371200000000003)
    If (feature 1866 <= 0.008616374999999999)
    If (feature 789 <= -0.0100215)
    If (feature 1794 <= -0.015113399999999999)
    If (feature 315 <= -0.009021685)
    Predict: -0.14581734458940906
    Else (feature 315 > -0.009021685)
    Predict: -0.43055000665867627
    Else (feature 1794 > -0.015113399999999999)
    If (feature 649 <= -0.008577419999999999)
    Predict: -0.6742799137165334
    Else (feature 649 > -0.008577419999999999)
    Predict: -0.466970082323807
    Else (feature 789 > -0.0100215)
    If (feature 755 <= 0.00919656)
    If (feature 315 <= -0.002158415)
    Predict: 0.04372444164831708
    Else (feature 315 > -0.002158415)
    Predict: -0.2799099183635169
    Else (feature 755 > 0.00919656)
    If (feature 1697 <= 0.0110916)
    Predict: -0.5381727158948686
    Else (feature 1697 > 0.0110916)
    Predict: -0.2597821083320546
    Else (feature 1866 > 0.008616374999999999)
    If (feature 789 <= -0.0144677)
    If (feature 315 <= -0.00447581)
    If (feature 755 <= -0.0100993)
    Predict: 0.22025316455696203
    Else (feature 755 > -0.0100993)
    Predict: -0.1234739607479524
    Else (feature 315 > -0.00447581)
    If (feature 936 <= 0.0018099549999999998)
    Predict: -0.517205957883924
    Else (feature 936 > 0.0018099549999999998)
    Predict: -0.18999735379730087
    Else (feature 789 > -0.0144677)
    If (feature 315 <= 0.002566035)
    If (feature 649 <= -0.01302385)
    Predict: 0.11884615384615385
    Else (feature 649 > -0.01302385)
    Predict: 0.43213844252163164
    Else (feature 315 > 0.002566035)
    If (feature 936 <= 0.0018099549999999998)
    Predict: -0.1847658260422028
    Else (feature 936 > 0.0018099549999999998)
    Predict: 0.14976641934597418
    Tree 1 (weight 0.1):
    If (feature 1389 <= -0.0025451)
    If (feature 994 <= -0.00340289)
    If (feature 1814 <= -4.766875E-4)
    .....
    Tree 99 (weight 0.1):
    If (feature 539 <= 0.0201035)
    If (feature 994 <= -0.0271962)
    If (feature 738 <= 0.020128649999999998)
    If (feature 1678 <= 0.001967885)
    If (feature 830 <= -0.01119755)
    Predict: -0.10394440691166018
    Else (feature 830 > -0.01119755)
    Predict: 0.06684711260628788
    Else (feature 1678 > 0.001967885)
    If (feature 958 <= -0.01238975)
    Predict: -0.05886208031154687
    Else (feature 958 > -0.01238975)
    Predict: 0.19195532805094076
    ......