执行任务并查看结果
在客户端下载并解压样例工程-开发程序中样例代码中对应的数据集到“/tmp/data/epsilon”目录,并执行任务,具体步骤如下:
- 进入“/tmp/data/epsilon”目录。
1
cd /tmp/data/epsilon
- 下载训练集。
1
wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.bz2
- 下载测试集。
1
wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.t.bz2
- 解压训练集和测试集到当前目录。
1 2
bzip2 -d epsilon_normalized.bz2 bzip2 -d epsilon_normalized.t.bz2
- 上传训练集和测试集到HDFS上。
1 2
hadoop fs -put /tmp/data/epsilon/epsilon_normalized /tmp/data/epsilon/ hadoop fs -put /tmp/data/epsilon/epsilon_normalized.t /tmp/data/epsilon/
- 将样例工程-开发程序中生成的kal_examples_2.12-0.1.jar和run_gbdt.sh放入客户端“/home/test/boostkit/”目录。
run_gbdt.sh内容如下:
1 2 3 4 5 6 7 8 9 10 11 12
spark-submit \ --class com.bigdata.examples.GBDTRunner \ --driver-class-path "./lib/*" \ --jars "lib/fastutil-8.3.1.jar,lib/boostkit-ml-acc_2.12-3.0.0-spark3.3.1.jar,lib/boostkit-ml-core_2.12-3.0.0-spark3.3.1.jar,lib/boostkit-ml-kernel_2.12-3.0.0-spark3.3.1-aarch64.jar" \ --conf "spark.executor.extraClassPath=fastutil-8.3.1.jar:boostkit-ml-acc_2.12-3.0.0-spark3.3.1.jar:boostkit-ml-core_2.12-3.0.0-spark3.3.1.jar:boostkit-ml-kernel_2.12-3.0.0-spark3.3.1-aarch64.jar" \ --master yarn \ --deploy-mode client \ --driver-cores 40 \ --driver-memory 50g \ --executor-cores 19 --num-executors 12 --executor-memory 77g \ --numPartitions 228 \ ./kal_examples_2.12-0.1.jar
- 执行任务。
1
sh run_gbdt.sh
屏幕上查看打印结果。
结果说明(总共迭代100次,共用生成100棵子树),选取其中的3个子树(Tree 0, 1, 99)进行展示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
Test Error = 0.1687594970527827 // 预测的分类误差 Learned classification GBT model: GBTClassificationModel (uid=gbtc_dbb4de23ca65) with 100 trees Tree 0 (weight 1.0): //每棵子树权重 If (feature 818 <= 0.0028371200000000003) //第818维特征的分裂点为0.002831200000000003 If (feature 1866 <= 0.0064008599999999995) If (feature 315 <= -0.0067819) If (feature 789 <= -0.0100215) If (feature 936 <= 0.0018099549999999998) Predict: -0.21098494850805388 //第0棵子树预测的结果 Else (feature 936 > 0.0018099549999999998) Predict: 0.15191210648637227 Else (feature 789 > -0.0100215) If (feature 936 <= -4.79549E-4) Predict: 0.17726731948384736 Else (feature 936 > -4.79549E-4) Predict: 0.49173760640961445 Else (feature 315 > -0.0067819) If (feature 789 <= -0.0100215) If (feature 936 <= 0.00412001) Predict: -0.5011764705882353 Else (feature 936 > 0.00412001) Predict: -0.21027097384924862 Else (feature 789 > -0.0100215) If (feature 649 <= -0.008577419999999999) Predict: -0.20268122451800152 Else (feature 649 > -0.008577419999999999) Predict: 0.12942312334057446 Else (feature 1866 > 0.0064008599999999995) If (feature 1697 <= 0.02054865) If (feature 649 <= -0.00122175) If (feature 315 <= 1.620675E-4) Predict: 0.31944546321425954 Else (feature 315 > 1.620675E-4) Predict: -0.014997100008285691 Else (feature 649 > -0.00122175) If (feature 315 <= 0.01095895) Predict: 0.5328728914862532 Else (feature 315 > 0.01095895) Predict: 0.2712697181277476 Else (feature 1697 > 0.02054865) If (feature 649 <= -0.008577419999999999) If (feature 315 <= 1.620675E-4) Predict: 0.5659284497444633 Else (feature 315 > 1.620675E-4) Predict: 0.29297616536595655 Else (feature 649 > -0.008577419999999999) If (feature 1519 <= -0.0024157199999999997) Predict: 0.5493390716261912 Else (feature 1519 > -0.0024157199999999997) Predict: 0.7277585664885257 Else (feature 818 > 0.0028371200000000003) If (feature 1866 <= 0.008616374999999999) If (feature 789 <= -0.0100215) If (feature 1794 <= -0.015113399999999999) If (feature 315 <= -0.009021685) Predict: -0.14581734458940906 Else (feature 315 > -0.009021685) Predict: -0.43055000665867627 Else (feature 1794 > -0.015113399999999999) If (feature 649 <= -0.008577419999999999) Predict: -0.6742799137165334 Else (feature 649 > -0.008577419999999999) Predict: -0.466970082323807 Else (feature 789 > -0.0100215) If (feature 755 <= 0.00919656) If (feature 315 <= -0.002158415) Predict: 0.04372444164831708 Else (feature 315 > -0.002158415) Predict: -0.2799099183635169 Else (feature 755 > 0.00919656) If (feature 1697 <= 0.0110916) Predict: -0.5381727158948686 Else (feature 1697 > 0.0110916) Predict: -0.2597821083320546 Else (feature 1866 > 0.008616374999999999) If (feature 789 <= -0.0144677) If (feature 315 <= -0.00447581) If (feature 755 <= -0.0100993) Predict: 0.22025316455696203 Else (feature 755 > -0.0100993) Predict: -0.1234739607479524 Else (feature 315 > -0.00447581) If (feature 936 <= 0.0018099549999999998) Predict: -0.517205957883924 Else (feature 936 > 0.0018099549999999998) Predict: -0.18999735379730087 Else (feature 789 > -0.0144677) If (feature 315 <= 0.002566035) If (feature 649 <= -0.01302385) Predict: 0.11884615384615385 Else (feature 649 > -0.01302385) Predict: 0.43213844252163164 Else (feature 315 > 0.002566035) If (feature 936 <= 0.0018099549999999998) Predict: -0.1847658260422028 Else (feature 936 > 0.0018099549999999998) Predict: 0.14976641934597418 Tree 1 (weight 0.1): If (feature 1389 <= -0.0025451) If (feature 994 <= -0.00340289) If (feature 1814 <= -4.766875E-4) ..... Tree 99 (weight 0.1): If (feature 539 <= 0.0201035) If (feature 994 <= -0.0271962) If (feature 738 <= 0.020128649999999998) If (feature 1678 <= 0.001967885) If (feature 830 <= -0.01119755) Predict: -0.10394440691166018 Else (feature 830 > -0.01119755) Predict: 0.06684711260628788 Else (feature 1678 > 0.001967885) If (feature 958 <= -0.01238975) Predict: -0.05886208031154687 Else (feature 958 > -0.01238975) Predict: 0.19195532805094076 ......
父主题: 样例工程