Modifying the Hadoop Configuration File

This section describes how to deploy the Hadoop cluster as a stable environment running as the root user, optimize the Hadoop configuration, and ensure efficient cluster running.

All Hadoop configuration files are stored in $HADOOP_HOME/etc/hadoop.

Go to the Hadoop configuration directory.
1
cd $HADOOP_HOME/etc/hadoop

Modify the hadoop-env.sh file.

On the server1 node, change the environment variable JAVA_HOME to an absolute path and set the user to the root user.

echo "export JAVA_HOME=/usr/local/jdk8u252-b09" >> hadoop-env.sh
echo "export HDFS_NAMENODE_USER=root" >> hadoop-env.sh
echo "export HDFS_SECONDARYNAMENODE_USER=root" >> hadoop-env.sh
echo "export HDFS_DATANODE_USER=root" >> hadoop-env.sh

Modify the yarn-env.sh file.

On the server1 node, change the user to the root user.

echo "export YARN_REGISTRYDNS_SECURE_USER=root" >> yarn-env.sh
echo "export YARN_RESOURCEMANAGER_USER=root" >> yarn-env.sh
echo "export YARN_NODEMANAGER_USER=root" >> yarn-env.sh

Modify the core-site.xml file.

Create a directory on server1.
1
mkdir /home/hadoop_tmp_dir
Create the core-site.xml file.
1
vi core-site.xml

Press i to enter the insert mode and add or modify parameters in the <configuration> tag range.

<configuration>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://server1:9000</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hadoop_tmp_dir</value>
</property>
<property>
   <name>ipc.client.connect.max.retries</name>
   <value>100</value>
</property>
<property>
   <name>ipc.client.connect.retry.interval</name>
   <value>10000</value>
</property>
<property>
   <name>hadoop.proxyuser.root.hosts</name>
   <value>*</value>
</property>
<property>
   <name>hadoop.proxyuser.root.groups</name>
   <value>*</value>
</property>
</configuration>

Press Esc, type :wq!, and press Enter to save the file and exit.

Modify the hdfs-site.xml file.

Create a directory for dfs.datanode.data.dir on agent1, agent2, and agent3.
The following is an example:
1
mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop
Create a directory corresponding to dfs.namenode.name.dir on server1.
The following is an example:
1
mkdir -p /data/data1/hadoop/nn
On the server1 node, modify the hdfs-site.xml file.
1
vi hdfs-site.xml

Press i to enter the insert mode and add or modify parameters in the <configuration> tag range.

<configuration>
<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>/data/data1/hadoop/nn</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
<value>/data/data1/hadoop/dn,/data/data2/hadoop/dn,/data/data3/hadoop/dn,/data/data4/hadoop/dn,/data/data5/hadoop/dn,/data/data6/hadoop/dn,/data/data7/hadoop/dn,/data/data8/hadoop/dn,/data/data9/hadoop/dn,/data/data10/hadoop/dn,/data/data11/hadoop/dn,/data/data12/hadoop/dn</value>
</property>
<property>
    <name>dfs.http.address</name>
    <value>server1:50070</value>
</property>
<property>
    <name>dfs.namenode.http-bind-host</name>
    <value>0.0.0.0</value>
</property>
<property>
    <name>dfs.datanode.handler.count</name>
    <value>600</value>
</property>
<property>
    <name>dfs.namenode.handler.count</name>
    <value>600</value>
</property>
<property>
    <name>dfs.namenode.service.handler.count</name>
    <value>600</value>
</property>
<property>
    <name>ipc.server.handler.queue.size</name>
    <value>300</value>
</property>
<property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
</property>
</configuration>

Press Esc, type :wq!, and press Enter to save the file and exit.

Modify the mapred-site.xml file.

On the server1 node, edit the mapred-site.xml file.
1
vi mapred-site.xml

Press i to enter the insert mode and add or modify parameters in the <configuration> tag range.

<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    <final>true</final>
    <description>The runtime framework for executing MapReduce jobs</description>
</property>
<property>
    <name>mapreduce.job.reduce.slowstart.completedmaps</name>
    <value>0.88</value>
</property>
<property>
    <name>mapreduce.application.classpath</name>
    <value>
        /usr/local/hadoop/etc/hadoop,
        /usr/local/hadoop/share/hadoop/common/*,
        /usr/local/hadoop/share/hadoop/common/lib/*,
        /usr/local/hadoop/share/hadoop/hdfs/*,
        /usr/local/hadoop/share/hadoop/hdfs/lib/*,
        /usr/local/hadoop/share/hadoop/mapreduce/*,
        /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
        /usr/local/hadoop/share/hadoop/yarn/*,
        /usr/local/hadoop/share/hadoop/yarn/lib/*
    </value>
</property>
<property>
    <name>mapreduce.map.memory.mb</name>
    <value>6144</value>
</property>
<property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>6144</value>
 </property>
 <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx5530m</value>
</property>
<property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx2765m</value>
</property>
<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx2048m -Xms2048m</value>
</property>
<property>
    <name>mapred.reduce.parallel.copies</name>
    <value>20</value>
</property>
<property>
    <name>yarn.app.mapreduce.am.env</name>
    <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
    <name>mapreduce.map.env</name>
    <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
    <name>mapreduce.reduce.env</name>
    <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
    <name>mapreduce.job.counters.max</name>
    <value>1000</value>
</property>
</configuration>

Press Esc, type :wq!, and press Enter to save the file and exit.

Modify the yarn-site.xml file.

Create a directory for yarn.nodemanager.local-dirs on agent1, agent2, and agent3.
The following is an example:
1
mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop/yarn
On the server1 node, edit the yarn-site.xml file.
1
vi yarn-site.xml

Press i to enter the insert mode and add or modify parameters in the <configuration> tag range.

<configuration>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    <final>true</final>
</property>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>server1</value>
</property>
<property>
    <name>yarn.resourcemanager.bind-host</name>
    <value>0.0.0.0</value>
</property>
<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>371200</value>
</property>
<property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>371200</value>
</property>
<property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1024</value>
</property>
<property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>64</value>
</property>
<property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>64</value>
</property>
<property>
    <name>yarn.scheduler.minimum-allocation-vcores</name> 
    <value>1</value>
</property>
<property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
</property>
<property>
    <name>yarn.client.nodemanager-connect.max-wait-ms</name>
    <value>300000</value>
</property>
<property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>2.1</value>
</property>
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>
<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>
<property>
    <name>yarn.application.classpath</name>
    <value>
        /usr/local/hadoop/etc/hadoop,
        /usr/local/hadoop/share/hadoop/common/*,
        /usr/local/hadoop/share/hadoop/common/lib/*,
        /usr/local/hadoop/share/hadoop/hdfs/*,
        /usr/local/hadoop/share/hadoop/hdfs/lib/*,
        /usr/local/hadoop/share/hadoop/mapreduce/*,
        /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
        /usr/local/hadoop/share/hadoop/yarn/*,
        /usr/local/hadoop/share/hadoop/yarn/lib/*
    </value>
</property>
<property>
    <name>yarn.nodemanager.local-dirs</name>
<value>/data/data1/hadoop/yarn/local,/data/data2/hadoop/yarn/local,/data/data3/hadoop/yarn/local,/data/data4/hadoop/yarn/local,/data/data5/hadoop/yarn/local,/data/data6/hadoop/yarn/local,/data/data7/hadoop/yarn/local,/data/data8/hadoop/yarn/local,/data/data9/hadoop/yarn/local,/data/data10/hadoop/yarn/local,/data/data11/hadoop/yarn/local,/data/data12/hadoop/yarn/local</value>
    </property>
<property>
    <name>yarn.nodemanager.log-dirs</name><value>/data/data1/hadoop/yarn/log,/data/data2/hadoop/yarn/log,/data/data3/hadoop/yarn/log,/data/data4/hadoop/yarn/log,/data/data5/hadoop/yarn/log,/data/data6/hadoop/yarn/log,/data/data7/hadoop/yarn/log,/data/data8/hadoop/yarn/log,/data/data9/hadoop/yarn/log,/data/data10/hadoop/yarn/log,/data/data11/hadoop/yarn/log,/data/data12/hadoop/yarn/log</value>
</property>
<property>
    <name>yarn.timeline-service.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.timeline-service.hostname</name>
    <value>server1</value>
</property>
<property>
    <name>yarn.timeline-service.http-cross-origin.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
    <value>true</value>
</property>
</configuration>

Press Esc, type :wq!, and press Enter to save the file and exit.

Modify the slaves or workers file.
1. Check the Hadoop version. If the Hadoop version is earlier than 3.x, modify the slaves file. If the Hadoop version is 3.x or later, modify the workers file. This document uses Hadoop 3.1.1 as an example. Therefore, the workers file is modified.
2. Open the file on the server1 node.
  1
  vi workers
3. Press i to enter the insert mode. Modify the workers file and delete all the content except the IP addresses or host names of all agent nodes.
```
agent1
agent2
agent3
```
4. Press Esc, type :wq!, and press Enter to save the file and exit.

Parent topic: Deploying Hadoop