Rate This Document
Findability
Accuracy
Completeness
Readability

Bayesian (CPU-intensive)

Purpose

Bayesian is a CPU-intensive scenario. You can adjust I/O parameters and Spark executor parameters for the optimal performance.

Procedure

  • You can use the following partition settings:
    1
    2
    spark.sql.shuffle.partitions 1000
    spark.default.parallelism 2500
    
  • Open the conf/spark.conf file of HiBench and add the following executor parameters:
    1
    2
    3
    4
    yarn.executor.num 9
    yarn.executor.cores 25
    spark.executor.memory 73G
    spark.driver.memory 36G
    
  • In this scenario, use the following kernel parameters:
    1
    2
    3
    4
    5
    6
    echo mq-deadline > /sys/block/sd$i/queue/scheduler
    echo 0 > /sys/module/scsi_mod/parameters/use_blk_mq
    echo 50 > /proc/sys/vm/dirty_background_ratio
    echo 80 > /proc/sys/vm/dirty_ratio
    echo 500 > /proc/sys/vm/dirty_expire_centisecs
    echo 100 > /proc/sys/vm/dirty_writeback_centisecs
    
  • Adjust the JDK parameters and add the following configurations to the spark.conf file:
    1
    spark.executor.extraJavaOptions -XX:+UseNUMA -Xms60g -Xmn25g -XX:+UseParallelOldGC -XX:ParallelGCThreads=24 -XX:+AlwaysPreTouch -XX:-UseAdaptiveSizePolicy