Executing a Task

On the client, download and extract the dataset corresponding to the sample code in Developing an Application to the /tmp/data/epsilon directory and execute the task. The procedure is as follows:

Go to the /tmp/data/epsilon directory.
1

cd /tmp/data/epsilon

Download the training set.

       
            wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.bz2

Download the test set.

       
            wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.t.bz2

Decompress the training set and test set to the current directory.

       
            bzip2 -d epsilon_normalized.bz2
bzip2 -d epsilon_normalized.t.bz2

Upload the training set and test set to the HDFS.

       
            hadoop fs -put /tmp/data/epsilon/epsilon_normalized  /tmp/data/epsilon/
hadoop fs -put /tmp/data/epsilon/epsilon_normalized.t  /tmp/data/epsilon/

Save the kal_examples_2.11-0.1.jar and run_gbdt.sh files generated in Developing an Application to the /home/test/boostkit/ directory on the client as described in Cluster Environment.

The content of run_gbdt.sh is as follows:

       
            spark-submit \
--class com.bigdata.examples.GBDTRunner \
--driver-class-path "./lib/*" \
--jars "lib/fastutil-8.3.1.jar,lib/boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar,lib/boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar,lib/boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar" \
--conf "spark.executor.extraClassPath=fastutil-8.3.1.jar:boostkit-ml-acc_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-core_2.11-1.3.0-spark2.3.2.jar:boostkit-ml-kernel_2.11-1.3.0-spark2.3.2-aarch64.jar" \
--master yarn \
--deploy-mode client \
--driver-cores 40 \
--driver-memory 50g \
--executor-cores 19 --num-executors 12 --executor-memory 77g \
--numPartitions 228 \
./kal_examples_2.11-0.1.jar

Execute the task.

       
            sh run_gbdt.sh

View the print result.

In the result, there are 100 iterations in total and 100 subtrees generated. Subtrees 0, 1, and 99 are selected for display.

Test Error = 0.1687594970527827  // Predicted classification errors
Learned classification GBT model:
GBTClassificationModel (uid=gbtc_dbb4de23ca65) with 100 trees
Tree 0 (weight 1.0):  //Weight of each subtree
If (feature 818 <= 0.0028371200000000003) // The split point of the 818th dimension feature is 0.002831200000000003.
If (feature 1866 <= 0.0064008599999999995)
If (feature 315 <= -0.0067819)
If (feature 789 <= -0.0100215)
If (feature 936 <= 0.0018099549999999998)
Predict: -0.21098494850805388 // Prediction result of subtree 0
Else (feature 936 > 0.0018099549999999998)
Predict: 0.15191210648637227
Else (feature 789 > -0.0100215)
If (feature 936 <= -4.79549E-4)
Predict: 0.17726731948384736
Else (feature 936 > -4.79549E-4)
Predict: 0.49173760640961445
Else (feature 315 > -0.0067819)
If (feature 789 <= -0.0100215)
If (feature 936 <= 0.00412001)
Predict: -0.5011764705882353
Else (feature 936 > 0.00412001)
Predict: -0.21027097384924862
Else (feature 789 > -0.0100215)
If (feature 649 <= -0.008577419999999999)
Predict: -0.20268122451800152
Else (feature 649 > -0.008577419999999999)
Predict: 0.12942312334057446
Else (feature 1866 > 0.0064008599999999995)
If (feature 1697 <= 0.02054865)
If (feature 649 <= -0.00122175)
If (feature 315 <= 1.620675E-4)
Predict: 0.31944546321425954
Else (feature 315 > 1.620675E-4)
Predict: -0.014997100008285691
Else (feature 649 > -0.00122175)
If (feature 315 <= 0.01095895)
Predict: 0.5328728914862532
Else (feature 315 > 0.01095895)
Predict: 0.2712697181277476
Else (feature 1697 > 0.02054865)
If (feature 649 <= -0.008577419999999999)
If (feature 315 <= 1.620675E-4)
Predict: 0.5659284497444633
Else (feature 315 > 1.620675E-4)
Predict: 0.29297616536595655
Else (feature 649 > -0.008577419999999999)
If (feature 1519 <= -0.0024157199999999997)
Predict: 0.5493390716261912
Else (feature 1519 > -0.0024157199999999997)
Predict: 0.7277585664885257
Else (feature 818 > 0.0028371200000000003)
If (feature 1866 <= 0.008616374999999999)
If (feature 789 <= -0.0100215)
If (feature 1794 <= -0.015113399999999999)
If (feature 315 <= -0.009021685)
Predict: -0.14581734458940906
Else (feature 315 > -0.009021685)
Predict: -0.43055000665867627
Else (feature 1794 > -0.015113399999999999)
If (feature 649 <= -0.008577419999999999)
Predict: -0.6742799137165334
Else (feature 649 > -0.008577419999999999)
Predict: -0.466970082323807
Else (feature 789 > -0.0100215)
If (feature 755 <= 0.00919656)
If (feature 315 <= -0.002158415)
Predict: 0.04372444164831708
Else (feature 315 > -0.002158415)
Predict: -0.2799099183635169
Else (feature 755 > 0.00919656)
If (feature 1697 <= 0.0110916)
Predict: -0.5381727158948686
Else (feature 1697 > 0.0110916)
Predict: -0.2597821083320546
Else (feature 1866 > 0.008616374999999999)
If (feature 789 <= -0.0144677)
If (feature 315 <= -0.00447581)
If (feature 755 <= -0.0100993)
Predict: 0.22025316455696203
Else (feature 755 > -0.0100993)
Predict: -0.1234739607479524
Else (feature 315 > -0.00447581)
If (feature 936 <= 0.0018099549999999998)
Predict: -0.517205957883924
Else (feature 936 > 0.0018099549999999998)
Predict: -0.18999735379730087
Else (feature 789 > -0.0144677)
If (feature 315 <= 0.002566035)
If (feature 649 <= -0.01302385)
Predict: 0.11884615384615385
Else (feature 649 > -0.01302385)
Predict: 0.43213844252163164
Else (feature 315 > 0.002566035)
If (feature 936 <= 0.0018099549999999998)
Predict: -0.1847658260422028
Else (feature 936 > 0.0018099549999999998)
Predict: 0.14976641934597418
Tree 1 (weight 0.1):
If (feature 1389 <= -0.0025451)
If (feature 994 <= -0.00340289)
If (feature 1814 <= -4.766875E-4)
.....
Tree 99 (weight 0.1):
If (feature 539 <= 0.0201035)
If (feature 994 <= -0.0271962)
If (feature 738 <= 0.020128649999999998)
If (feature 1678 <= 0.001967885)
If (feature 830 <= -0.01119755)
Predict: -0.10394440691166018
Else (feature 830 > -0.01119755)
Predict: 0.06684711260628788
Else (feature 1678 > 0.001967885)
If (feature 958 <= -0.01238975)
Predict: -0.05886208031154687
Else (feature 958 > -0.01238975)
Predict: 0.19195532805094076
......

Parent topic: Sample Project