鲲鹏社区首页
中文
注册
我要评分
文档获取效率
文档正确性
内容完整性
文档易理解
在线提单
论坛求助

执行任务并查看结果

在客户端下载并解压样例工程-开发程序中样例代码中对应的数据集到“/tmp/data/epsilon”目录,并执行任务,具体步骤如下:

  1. 进入“/tmp/data/epsilon”目录。
    1
    cd /tmp/data/epsilon
    
  2. 下载训练集。
    1
    wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.bz2
    
  3. 下载测试集。
    1
    wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.t.bz2
    
  4. 解压训练集和测试集到当前目录。
    1
    2
    bzip2 -d epsilon_normalized.bz2
    bzip2 -d epsilon_normalized.t.bz2
    
  5. 上传训练集和测试集到HDFS上。
    1
    2
    hadoop fs -put /tmp/data/epsilon/epsilon_normalized  /tmp/data/epsilon/
    hadoop fs -put /tmp/data/epsilon/epsilon_normalized.t  /tmp/data/epsilon/
    
  6. 样例工程-开发程序中生成的kal_examples_2.12-0.1.jar和run_gbdt.sh放入客户端“/home/test/boostkit/”目录。

    run_gbdt.sh内容如下:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    spark-submit \
    --class com.bigdata.examples.GBDTRunner \
    --driver-class-path "./lib/*" \
    --jars "lib/fastutil-8.3.1.jar,lib/boostkit-ml-acc_2.12-3.0.0-spark3.3.1.jar,lib/boostkit-ml-core_2.12-3.0.0-spark3.3.1.jar,lib/boostkit-ml-kernel_2.12-3.0.0-spark3.3.1-aarch64.jar" \
    --conf "spark.executor.extraClassPath=fastutil-8.3.1.jar:boostkit-ml-acc_2.12-3.0.0-spark3.3.1.jar:boostkit-ml-core_2.12-3.0.0-spark3.3.1.jar:boostkit-ml-kernel_2.12-3.0.0-spark3.3.1-aarch64.jar" \
    --master yarn \
    --deploy-mode client \
    --driver-cores 40 \
    --driver-memory 50g \
    --executor-cores 19 --num-executors 12 --executor-memory 77g \
    --numPartitions 228 \
    ./kal_examples_2.12-0.1.jar
    
  7. 执行任务。
    1
    sh run_gbdt.sh
    

    屏幕上查看打印结果。

    结果说明(总共迭代100次,共用生成100棵子树),选取其中的3个子树(Tree 0, 1, 99)进行展示:

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    Test Error = 0.1687594970527827  // 预测的分类误差
    Learned classification GBT model:
    GBTClassificationModel (uid=gbtc_dbb4de23ca65) with 100 trees
    Tree 0 (weight 1.0):  //每棵子树权重
    If (feature 818 <= 0.0028371200000000003) //第818维特征的分裂点为0.002831200000000003
    If (feature 1866 <= 0.0064008599999999995)
    If (feature 315 <= -0.0067819)
    If (feature 789 <= -0.0100215)
    If (feature 936 <= 0.0018099549999999998)
    Predict: -0.21098494850805388 //第0棵子树预测的结果
    Else (feature 936 > 0.0018099549999999998)
    Predict: 0.15191210648637227
    Else (feature 789 > -0.0100215)
    If (feature 936 <= -4.79549E-4)
    Predict: 0.17726731948384736
    Else (feature 936 > -4.79549E-4)
    Predict: 0.49173760640961445
    Else (feature 315 > -0.0067819)
    If (feature 789 <= -0.0100215)
    If (feature 936 <= 0.00412001)
    Predict: -0.5011764705882353
    Else (feature 936 > 0.00412001)
    Predict: -0.21027097384924862
    Else (feature 789 > -0.0100215)
    If (feature 649 <= -0.008577419999999999)
    Predict: -0.20268122451800152
    Else (feature 649 > -0.008577419999999999)
    Predict: 0.12942312334057446
    Else (feature 1866 > 0.0064008599999999995)
    If (feature 1697 <= 0.02054865)
    If (feature 649 <= -0.00122175)
    If (feature 315 <= 1.620675E-4)
    Predict: 0.31944546321425954
    Else (feature 315 > 1.620675E-4)
    Predict: -0.014997100008285691
    Else (feature 649 > -0.00122175)
    If (feature 315 <= 0.01095895)
    Predict: 0.5328728914862532
    Else (feature 315 > 0.01095895)
    Predict: 0.2712697181277476
    Else (feature 1697 > 0.02054865)
    If (feature 649 <= -0.008577419999999999)
    If (feature 315 <= 1.620675E-4)
    Predict: 0.5659284497444633
    Else (feature 315 > 1.620675E-4)
    Predict: 0.29297616536595655
    Else (feature 649 > -0.008577419999999999)
    If (feature 1519 <= -0.0024157199999999997)
    Predict: 0.5493390716261912
    Else (feature 1519 > -0.0024157199999999997)
    Predict: 0.7277585664885257
    Else (feature 818 > 0.0028371200000000003)
    If (feature 1866 <= 0.008616374999999999)
    If (feature 789 <= -0.0100215)
    If (feature 1794 <= -0.015113399999999999)
    If (feature 315 <= -0.009021685)
    Predict: -0.14581734458940906
    Else (feature 315 > -0.009021685)
    Predict: -0.43055000665867627
    Else (feature 1794 > -0.015113399999999999)
    If (feature 649 <= -0.008577419999999999)
    Predict: -0.6742799137165334
    Else (feature 649 > -0.008577419999999999)
    Predict: -0.466970082323807
    Else (feature 789 > -0.0100215)
    If (feature 755 <= 0.00919656)
    If (feature 315 <= -0.002158415)
    Predict: 0.04372444164831708
    Else (feature 315 > -0.002158415)
    Predict: -0.2799099183635169
    Else (feature 755 > 0.00919656)
    If (feature 1697 <= 0.0110916)
    Predict: -0.5381727158948686
    Else (feature 1697 > 0.0110916)
    Predict: -0.2597821083320546
    Else (feature 1866 > 0.008616374999999999)
    If (feature 789 <= -0.0144677)
    If (feature 315 <= -0.00447581)
    If (feature 755 <= -0.0100993)
    Predict: 0.22025316455696203
    Else (feature 755 > -0.0100993)
    Predict: -0.1234739607479524
    Else (feature 315 > -0.00447581)
    If (feature 936 <= 0.0018099549999999998)
    Predict: -0.517205957883924
    Else (feature 936 > 0.0018099549999999998)
    Predict: -0.18999735379730087
    Else (feature 789 > -0.0144677)
    If (feature 315 <= 0.002566035)
    If (feature 649 <= -0.01302385)
    Predict: 0.11884615384615385
    Else (feature 649 > -0.01302385)
    Predict: 0.43213844252163164
    Else (feature 315 > 0.002566035)
    If (feature 936 <= 0.0018099549999999998)
    Predict: -0.1847658260422028
    Else (feature 936 > 0.0018099549999999998)
    Predict: 0.14976641934597418
    Tree 1 (weight 0.1):
    If (feature 1389 <= -0.0025451)
    If (feature 994 <= -0.00340289)
    If (feature 1814 <= -4.766875E-4)
    .....
    Tree 99 (weight 0.1):
    If (feature 539 <= 0.0201035)
    If (feature 994 <= -0.0271962)
    If (feature 738 <= 0.020128649999999998)
    If (feature 1678 <= 0.001967885)
    If (feature 830 <= -0.01119755)
    Predict: -0.10394440691166018
    Else (feature 830 > -0.01119755)
    Predict: 0.06684711260628788
    Else (feature 1678 > 0.001967885)
    If (feature 958 <= -0.01238975)
    Predict: -0.05886208031154687
    Else (feature 958 > -0.01238975)
    Predict: 0.19195532805094076
    ......