Bayesian (CPU-intensive)
Purpose
Bayesian is a CPU-intensive scenario. You can adjust I/O parameters and Spark executor parameters for the optimal performance.
Procedure
- You can use the following partition settings:
1 2
spark.sql.shuffle.partitions 1000 spark.default.parallelism 2500
- Open the conf/spark.conf file of HiBench and add the following executor parameters:
1 2 3 4
yarn.executor.num 9 yarn.executor.cores 25 spark.executor.memory 73G spark.driver.memory 36G
- In this scenario, use the following kernel parameters:
1 2 3 4 5 6
echo mq-deadline > /sys/block/sd$i/queue/scheduler echo 0 > /sys/module/scsi_mod/parameters/use_blk_mq echo 50 > /proc/sys/vm/dirty_background_ratio echo 80 > /proc/sys/vm/dirty_ratio echo 500 > /proc/sys/vm/dirty_expire_centisecs echo 100 > /proc/sys/vm/dirty_writeback_centisecs
- Adjust the JDK parameters and add the following configurations to the spark.conf file:
1spark.executor.extraJavaOptions -XX:+UseNUMA -Xms60g -Xmn25g -XX:+UseParallelOldGC -XX:ParallelGCThreads=24 -XX:+AlwaysPreTouch -XX:-UseAdaptiveSizePolicy
Parent topic: (HiBench Scenario) Special Scenario Tuning