Modifying Configurations

Modifying hadoop.conf

Open the configuration file.

vi conf/hadoop.conf.template hadoop.conf

Press i to enter the insert mode and modify the file based on your requirements:

# Hadoop home
    hibench.hadoop.home     /usr/hdp/current/hadoop-client
# The root HDFS path to store HiBench data
    hibench.hdfs.master       hdfs://hadoop102:8020
 # Hadoop release provider. Supported value: apache, cdh5, hdp
     hibench.hadoop.release    hdp

Press Esc, type :wq!, and press Enter to save the file and exit.

Modifying spark.conf

Open the configuration file.

cp spark.conf.template spark.conf
vi spark.conf

Press i to enter the insert mode and modify the file based on your requirements:

# Spark home
hibench.spark.home      /usr/hdp/current/spark2-client

# executor number and cores when running on Yarn
hibench.yarn.executor.num     20
hibench.yarn.executor.cores   19

# executor and driver memory in standalone & YARN mode
spark.executor.memory  44g
spark.driver.memory    36g

Press Esc, type :wq!, and press Enter to save the file and exit.

Modifying hibench.conf

Open the configuration file.
1
vi hibench.conf

Press i to enter the insert mode and modify the file based on your requirements:

# The definition of these profiles can be found in the workload's conf file i.e. conf/workloads/micro/wordcount.conf
hibench.scale.profile                small #small corresponds to the value in HiBench-HiBench-7.0/conf/workloads/micro/wordcount.conf.
# Mapper number in hadoop, partition number in Spark
hibench.default.map.parallelism         8

# Reducer nubmer in hadoop, shuffle partition number in Spark
hibench.default.shuffle.parallelism     8

#Access HiBench-HiBench-7.0/conf/workloads/micro/wordcount.conf and modify the data volume of the corresponding levels.

Press Esc, type :wq!, and press Enter to save the file and exit.

Check wordcount.conf

1	cat workloads/micro/wordcount.conf

#datagen
hibench.wordcount.tiny.datasize                 32000
hibench.wordcount.small.datasize                320000000
hibench.wordcount.large.datasize                3200000000
hibench.wordcount.huge.datasize                 32000000000
hibench.wordcount.gigantic.datasize             320000000000
hibench.wordcount.bigdata.datasize              1600000000000

Parent topic: Testing Spark