Modifying the Hadoop Configuration File
This section describes how to deploy the Hadoop cluster as a stable environment running as the root user, optimize the Hadoop configuration, and ensure efficient cluster running.
All Hadoop configuration files are stored in $HADOOP_HOME/etc/hadoop.
- Go to the Hadoop configuration directory.
1cd $HADOOP_HOME/etc/hadoop
- Modify the hadoop-env.sh file.On the server1 node, change the environment variable JAVA_HOME to an absolute path and set the user to the root user.
1 2 3 4
echo "export JAVA_HOME=/usr/local/jdk8u252-b09" >> hadoop-env.sh echo "export HDFS_NAMENODE_USER=root" >> hadoop-env.sh echo "export HDFS_SECONDARYNAMENODE_USER=root" >> hadoop-env.sh echo "export HDFS_DATANODE_USER=root" >> hadoop-env.sh
- Modify the yarn-env.sh file.
On the server1 node, change the user to the root user.
1 2 3
echo "export YARN_REGISTRYDNS_SECURE_USER=root" >> yarn-env.sh echo "export YARN_RESOURCEMANAGER_USER=root" >> yarn-env.sh echo "export YARN_NODEMANAGER_USER=root" >> yarn-env.sh
- Modify the core-site.xml file.
- Create a directory on server1.
1mkdir /home/hadoop_tmp_dir - Create the core-site.xml file.
1vi core-site.xml - Press i to enter the insert mode and add or modify parameters in the <configuration> tag range.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://server1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop_tmp_dir</value> </property> <property> <name>ipc.client.connect.max.retries</name> <value>100</value> </property> <property> <name>ipc.client.connect.retry.interval</name> <value>10000</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration>
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Create a directory on server1.
- Modify the hdfs-site.xml file.
- Create a directory for dfs.datanode.data.dir on agent1, agent2, and agent3.
The following is an example:
1mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop
- Create a directory corresponding to dfs.namenode.name.dir on server1.
The following is an example:
1mkdir -p /data/data1/hadoop/nn
- On the server1 node, modify the hdfs-site.xml file.
1vi hdfs-site.xml - Press i to enter the insert mode and add or modify parameters in the <configuration> tag range.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
<configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/data/data1/hadoop/nn</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/data/data1/hadoop/dn,/data/data2/hadoop/dn,/data/data3/hadoop/dn,/data/data4/hadoop/dn,/data/data5/hadoop/dn,/data/data6/hadoop/dn,/data/data7/hadoop/dn,/data/data8/hadoop/dn,/data/data9/hadoop/dn,/data/data10/hadoop/dn,/data/data11/hadoop/dn,/data/data12/hadoop/dn</value> </property> <property> <name>dfs.http.address</name> <value>server1:50070</value> </property> <property> <name>dfs.namenode.http-bind-host</name> <value>0.0.0.0</value> </property> <property> <name>dfs.datanode.handler.count</name> <value>600</value> </property> <property> <name>dfs.namenode.handler.count</name> <value>600</value> </property> <property> <name>dfs.namenode.service.handler.count</name> <value>600</value> </property> <property> <name>ipc.server.handler.queue.size</name> <value>300</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Create a directory for dfs.datanode.data.dir on agent1, agent2, and agent3.
- Modify the mapred-site.xml file.
- On the server1 node, edit the mapred-site.xml file.
1vi mapred-site.xml - Press i to enter the insert mode and add or modify parameters in the <configuration> tag range.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> <description>The runtime framework for executing MapReduce jobs</description> </property> <property> <name>mapreduce.job.reduce.slowstart.completedmaps</name> <value>0.88</value> </property> <property> <name>mapreduce.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>6144</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>6144</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx5530m</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2765m</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx2048m -Xms2048m</value> </property> <property> <name>mapred.reduce.parallel.copies</name> <value>20</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value> </property> <property> <name>mapreduce.job.counters.max</name> <value>1000</value> </property> </configuration>
- Press Esc, type :wq!, and press Enter to save the file and exit.
- On the server1 node, edit the mapred-site.xml file.
- Modify the yarn-site.xml file.
- Create a directory for yarn.nodemanager.local-dirs on agent1, agent2, and agent3.
The following is an example:
1mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop/yarn
- On the server1 node, edit the yarn-site.xml file.
1vi yarn-site.xml - Press i to enter the insert mode and add or modify parameters in the <configuration> tag range.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <final>true</final> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>server1</value> </property> <property> <name>yarn.resourcemanager.bind-host</name> <value>0.0.0.0</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>371200</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>371200</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>64</value> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>64</value> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.client.nodemanager-connect.max-wait-ms</name> <value>300000</value> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>2.1</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/data/data1/hadoop/yarn/local,/data/data2/hadoop/yarn/local,/data/data3/hadoop/yarn/local,/data/data4/hadoop/yarn/local,/data/data5/hadoop/yarn/local,/data/data6/hadoop/yarn/local,/data/data7/hadoop/yarn/local,/data/data8/hadoop/yarn/local,/data/data9/hadoop/yarn/local,/data/data10/hadoop/yarn/local,/data/data11/hadoop/yarn/local,/data/data12/hadoop/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name><value>/data/data1/hadoop/yarn/log,/data/data2/hadoop/yarn/log,/data/data3/hadoop/yarn/log,/data/data4/hadoop/yarn/log,/data/data5/hadoop/yarn/log,/data/data6/hadoop/yarn/log,/data/data7/hadoop/yarn/log,/data/data8/hadoop/yarn/log,/data/data9/hadoop/yarn/log,/data/data10/hadoop/yarn/log,/data/data11/hadoop/yarn/log,/data/data12/hadoop/yarn/log</value> </property> <property> <name>yarn.timeline-service.enabled</name> <value>true</value> </property> <property> <name>yarn.timeline-service.hostname</name> <value>server1</value> </property> <property> <name>yarn.timeline-service.http-cross-origin.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.system-metrics-publisher.enabled</name> <value>true</value> </property> </configuration>
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Create a directory for yarn.nodemanager.local-dirs on agent1, agent2, and agent3.
- Modify the slaves or workers file.
- Check the Hadoop version. If the Hadoop version is earlier than 3.x, modify the slaves file. If the Hadoop version is 3.x or later, modify the workers file. This document uses Hadoop 3.1.1 as an example. Therefore, the workers file is modified.
- Open the file on the server1 node.
1vi workers - Press i to enter the insert mode. Modify the workers file and delete all the content except the IP addresses or host names of all agent nodes.
agent1 agent2 agent3
- Press Esc, type :wq!, and press Enter to save the file and exit.
Parent topic: Deploying Hadoop