Rate This Document
Findability
Accuracy
Completeness
Readability

Other Tuning Items

Purpose

Adjust the number of partitions based on the number of cores to ensure that the data volume processed by each core is the same as much as possible. This helps minimize data skew and prevents excessive processing time on a single core.

Procedure

  • In this scenario, you can set the number of partitions and parallelism to three to five times the total number of CPU cores. This helps reduce the size of files processed by each task and improve performance. You can use the following partition settings:
    1
    2
    spark.sql.shuffle.partitions 1000
    spark.default.parallelism 2000
    
  • Based on the actual environment, adjust the number of running cores and memory size specified by HiBench in the configuration file to achieve the optimal performance. For example, for the Kunpeng 920 5220 processor, the following executor parameters are recommended for TeraSort.
    1
    2
    3
    4
    yarn.executor.num 27
    yarn.executor.cores 7
    spark.executor.memory 25G
    spark.driver.memory 36G