K-means是CPU密集型场景,可以对IO参数和Spark执行参数进行调整。
1 2 | spark.sql.shuffle.partitions 1000 spark.default.parallelism 2500 |
1 | echo 4096 > /sys/block/sd$i/queue/read_ahead_kb |
1 2 3 4 5 | yarn.executor.num 42 yarn.executor.cores 6 spark.executor.memory 15G spark.driver.memory 36G spark.locality.wait 10s |
1 2 | spark.executor.extraJavaOptions -XX:+UseNUMA -XX:BoxTypeCachedMax=100000 -XX:ParScavengePerStrideChunk=8192 spark.yarn.am.extraJavaOptions -XX:+UseNUMA -XX:BoxTypeCachedMax=100000 -XX:ParScavengePerStrideChunk=8192 |
毕昇JDK对K-means有特定的优化,可以使用毕昇JDK进行加速。
1 2 | tar –zxvf bisheng-jdk-8u262-linux-aarch64.tar.gz mv bisheng-jdk1.8.0_262 /usr/local/ |
1 2 3 | mv jdk8u222-b10/ jdk8u222-b10-openjdk/ mv bisheng-jdk1.8.0_262/ jdk8u222-b10/ chmod -R 755 jdk8u222-b10/ |
1 | vi /opt/HiBench-HiBench-7.0/conf/spark.conf
|
1 2 3 | spark.executor.extraJavaOptions -XX:+UnlockExperimentalVMOptions -XX:+EnableIntrinsicExternal -XX:+UseF2jBLASIntrinsics -Xms43g -XX:ParallelGCThreads=8 |
-xms(n)g取决于“/opt/HiBench-HiBench-7.0/conf/spark.conf”中spark.executor.memory值的大小n的值建议设置为spark.executor.memory的值减1。