我要评分
获取效率
正确性
完整性
易理解

WordCount (I/O- + CPU-intensive)

Purpose

WordCount is an I/O- and CPU-intensive scenario, where the mq-deadline algorithm and I/O parameter adjustment can deliver higher performance than that of the single-queue deadline scheduling algorithm.

Procedure

  • Modify the following configurations. sd$i indicates the names of all drives that are tested.
    1
    2
    3
    4
    5
    6
    echo mq-deadline > /sys/block/sd$i/queue/scheduler
    echo 512 > /sys/block/sd$i/queue/nr_requests
    echo 8192 > /sys/block/sd$i/queue/read_ahead_kb
    echo 500 > /proc/sys/vm/dirty_expire_centisecs
    echo 100 > /proc/sys/vm/dirty_writeback_centisecs
    echo 5 > /proc/sys/vm/dirty_background_ratio
    
  • In this scenario, you can set the number of partitions and parallelism to three to five times the total number of CPU cores. This helps reduce the size of files processed by each task and improve performance. You can use the following partition settings:
    1
    2
    spark.sql.shuffle.partitions 300
    spark.default.parallelism 600
    
  • Based on the actual environment, adjust the number of running cores and memory size specified by HiBench in the configuration file to achieve the optimal performance. For example, for the Kunpeng 920 5220 processor, you are advised to set the executor parameters as follows in the WordCount scenario:
    1
    2
    3
    4
    yarn.executor.num 51
    yarn.executor.cores 6
    spark.executor.memory 13G
    spark.driver.memory 36G
    
  • Adjust the JDK parameters to improve the performance (by 3%). Add the following configurations to the spark.conf file:
    1
    2
    spark.executor.extraJavaOptions -XX:+UseNUMA -XX:BoxTypeCachedMax=100000 -XX:ParScavengePerStrideChunk=8192
    spark.yarn.am.extraJavaOptions -XX:+UseNUMA -XX:BoxTypeCachedMax=100000 -XX:ParScavengePerStrideChunk=8192"