修改Hadoop配置文件

Hadoop所有的配置文件都在“$HADOOP_HOME/etc/hadoop”目录下,修改以下配置文件前,需要切换到“HADOOP_HOME/etc/hadoop”目录。

1
cd $HADOOP_HOME/etc/hadoop

修改hadoop-env.sh

修改环境变量JAVA_HOME为绝对路径,并修改用户为root。

1
2
3
4
echo "export JAVA_HOME=/usr/local/jdk8u252-b09" >> hadoop-env.sh
echo "export HDFS_NAMENODE_USER=root" >> hadoop-env.sh
echo "export HDFS_SECONDARYNAMENODE_USER=root" >> hadoop-env.sh
echo "export HDFS_DATANODE_USER=root" >> hadoop-env.sh

修改yarn-env.sh

修改用户为root。

1
2
3
echo "export YARN_REGISTRYDNS_SECURE_USER=root" >> yarn-env.sh
echo "export YARN_RESOURCEMANAGER_USER=root" >> yarn-env.sh
echo "export YARN_NODEMANAGER_USER=root" >> yarn-env.sh

修改core-site.xml

  1. 在节点server1上创建目录。

    1
    mkdir /home/hadoop_tmp_dir
    

  2. 编辑core-site.xml文件。

    1
    vi core-site.xml
    

  3. “i”进入编辑模式,添加或修改<configuration>标签范围内的参数。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    <configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://server1:9000</value>
    </property>
    <property>
      <name>hadoop.tmp.dir</name>
      <value>/home/hadoop_tmp_dir</value>
    </property>
    <property>
       <name>ipc.client.connect.max.retries</name>
       <value>100</value>
    </property>
    <property>
       <name>ipc.client.connect.retry.interval</name>
       <value>10000</value>
    </property>
    <property>
       <name>hadoop.proxyuser.root.hosts</name>
       <value>*</value>
    </property>
    <property>
       <name>hadoop.proxyuser.root.groups</name>
       <value>*</value>
    </property>
    </configuration>
    

  4. “Esc”键,输入:wq!,按“Enter”保存并退出编辑。

修改hdfs-site.xml

  1. 节点agent1、agent2、agent3分别创建dfs.datanode.data.dir对应目录。

    举例:

    1
    mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop
    

  2. 修改hdfs-site.xml文件。

    1
    vi hdfs-site.xml
    

  3. “i”进入编辑模式,添加或修改<configuration>标签范围内的参数。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    <configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/data/data1/hadoop/nn</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
    <value>/data/data1/hadoop/dn,/data/data2/hadoop/dn,/data/data3/hadoop/dn,/data/data4/hadoop/dn,/data/data5/hadoop/dn,/data/data6/hadoop/dn,/data/data7/hadoop/dn,/data/data8/hadoop/dn,/data/data9/hadoop/dn,/data/data10/hadoop/dn,/data/data11/hadoop/dn,/data/data12/hadoop/dn</value>
    </property>
    <property>
        <name>dfs.http.address</name>
        <value>server1:50070</value>
    </property>
    <property>
        <name>dfs.namenode.http-bind-host</name>
        <value>0.0.0.0</value>
    </property>
    <property>
        <name>dfs.datanode.handler.count</name>
        <value>600</value>
    </property>
    <property>
        <name>dfs.namenode.handler.count</name>
        <value>600</value>
    </property>
    <property>
        <name>dfs.namenode.service.handler.count</name>
        <value>600</value>
    </property>
    <property>
        <name>ipc.server.handler.queue.size</name>
        <value>300</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
    </configuration>
    

  4. “Esc”键,输入:wq!,按“Enter”保存并退出编辑。

修改mapred-site.xml

  1. 编辑mapred-site.xml文件。

    1
    vi mapred-site.xml
    

  2. “i”进入编辑模式,添加或修改<configuration>标签范围内的参数。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    <configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <final>true</final>
        <description>The runtime framework for executing MapReduce jobs</description>
    </property>
    <property>
        <name>mapreduce.job.reduce.slowstart.completedmaps</name>
        <value>0.88</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>
            /usr/local/hadoop/etc/hadoop,
            /usr/local/hadoop/share/hadoop/common/*,
            /usr/local/hadoop/share/hadoop/common/lib/*,
            /usr/local/hadoop/share/hadoop/hdfs/*,
            /usr/local/hadoop/share/hadoop/hdfs/lib/*,
            /usr/local/hadoop/share/hadoop/mapreduce/*,
            /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
            /usr/local/hadoop/share/hadoop/yarn/*,
            /usr/local/hadoop/share/hadoop/yarn/lib/*
        </value>
    </property>
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>6144</value>
    </property>
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>6144</value>
     </property>
     <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx5530m</value>
    </property>
    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx2765m</value>
    </property>
    <property>
        <name>mapred.child.java.opts</name>
        <value>-Xmx2048m -Xms2048m</value>
    </property>
    <property>
        <name>mapred.reduce.parallel.copies</name>
        <value>20</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
    </property>
    <property>
        <name>mapreduce.job.counters.max</name>
        <value>1000</value>
    </property>
    </configuration>
    

  3. “Esc”键,输入:wq!,按“Enter”保存并退出编辑。

修改yarn-site.xml

  1. 节点agent1、agent2、agent3分别创建yarn.nodemanager.local-dirs对应目录。

    举例:

    1
    mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop/yarn
    

  2. 编辑yarn-site.xml文件。

    1
    vi yarn-site.xml
    

  3. “i”进入编辑模式,添加或修改<configuration>标签范围内的参数。

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    <configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <final>true</final>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>server1</value>
    </property>
    <property>
        <name>yarn.resourcemanager.bind-host</name>
        <value>0.0.0.0</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>371200</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>371200</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>1024</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>64</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-vcores</name>
        <value>64</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-vcores</name> 
        <value>1</value>
    </property>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.client.nodemanager-connect.max-wait-ms</name>
        <value>300000</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.application.classpath</name>
        <value>
            /usr/local/hadoop/etc/hadoop,
            /usr/local/hadoop/share/hadoop/common/*,
            /usr/local/hadoop/share/hadoop/common/lib/*,
            /usr/local/hadoop/share/hadoop/hdfs/*,
            /usr/local/hadoop/share/hadoop/hdfs/lib/*,
            /usr/local/hadoop/share/hadoop/mapreduce/*,
            /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
            /usr/local/hadoop/share/hadoop/yarn/*,
            /usr/local/hadoop/share/hadoop/yarn/lib/*
        </value>
    </property>
    <property>
        <name>yarn.nodemanager.local-dirs</name>
    <value>/data/data1/hadoop/yarn/local,/data/data2/hadoop/yarn/local,/data/data3/hadoop/yarn/local,/data/data4/hadoop/yarn/local,/data/data5/hadoop/yarn/local,/data/data6/hadoop/yarn/local,/data/data7/hadoop/yarn/local,/data/data8/hadoop/yarn/local,/data/data9/hadoop/yarn/local,/data/data10/hadoop/yarn/local,/data/data11/hadoop/yarn/local,/data/data12/hadoop/yarn/local</value>
        </property>
    <property>
        <name>yarn.nodemanager.log-dirs</name><value>/data/data1/hadoop/yarn/log,/data/data2/hadoop/yarn/log,/data/data3/hadoop/yarn/log,/data/data4/hadoop/yarn/log,/data/data5/hadoop/yarn/log,/data/data6/hadoop/yarn/log,/data/data7/hadoop/yarn/log,/data/data8/hadoop/yarn/log,/data/data9/hadoop/yarn/log,/data/data10/hadoop/yarn/log,/data/data11/hadoop/yarn/log,/data/data12/hadoop/yarn/log</value>
    </property>
    <property>
        <name>yarn.timeline-service.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.timeline-service.hostname</name>
        <value>server1</value>
    </property>
    <property>
        <name>yarn.timeline-service.http-cross-origin.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
        <value>true</value>
    </property>
    </configuration>
    

  4. “Esc”键,输入:wq!,按“Enter”保存并退出编辑。

修改slaves或workers

  1. 确认Hadoop版本,3.x以下的版本编辑slaves文件,3.x及以上的编辑workers文件。
  2. 编辑文件(本文版本3.1.1)。

    1
    vi workers
    

  3. “i”进入编辑模式,修改workers文件,只保存所有agent节点的IP地址(可用主机名代替),其余内容均删除。

    agent1
    agent2
    agent3

  4. “Esc”键,输入:wq!,按“Enter”保存并退出编辑。