Configuring Hadoop
Hadoop can run on a single node in pseudo-distributed mode. The Hadoop process runs as a separated Java process. The node functions as both the NameNode and DataNode and reads files in the HDFS.
The configuration file is stored in hadoop-3.1.2/etc/hadoop. For a pseudo-distributed cluster, you need to modify the core-site.xml and hdfs-site.xml configuration files. The Hadoop configuration file is in XML format. Each configuration is implemented by declaring the name and value of property.
Procedure
- Use PuTTY to log in to the server as the root user.
- Run the following command to switch to the directory, in which the Hadoop installation package is stored:
cd path/to/HADOOP
- Run the following command to decompress the Hadoop installation package:
tar -xvf hadoop-3.1.2.tar.gz
- Run the following command to switch to the directory generated after the package is decompressed:
cd hadoop-3.1.2
- Run the following commands to create the following four folders in the hadoop-3.1.2 folder as the HDFS file paths:
mkdir hdfs mkdir hdfs/tmp mkdir hdfs/name mkdir hdfs/data
- Run the following command to go to the directory in which the configuration file is stored:
cd etc/hadoop/
- Run the following command to modify the core-site.xml file:
- Open the file.
vi core-site.xml
- Press i to enter the insert mode and edit the core-site.xml file.
Before the modification:
<configuration> </configuration>
After the modification:
<configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/path/to/HADOOP/hadoop-3.1.2/hdfs/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://armnode2:9000</value> </property> <property> <name>fs.trash.interval</name> <value>1440</value> </property> </configuration>
armnode2 indicates the hostname of the installation environment. You can set this parameter based on the actual situation. You can run the hostname command to query the hostname of the installation environment.
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Open the file.
- Run the following command to modify the hdfs-site.xml configuration file:
- Open the file.
vi hdfs-site.xml
- Press i to enter the insert mode and edit the hdfs-site.xml file.
Before the modification:
<configuration> </configuration>
After the modification:
<configuration> <property> <name>dfs.namenode.http-address</name> <value>armnode2:50070</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/path/to/HADOOP/hadoop-3.1.2/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/path/to/HADOOP/hadoop-3.1.2/hdfs/data</value> </property> <property> <name>dfs.webmnt.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
armnode2 indicates the hostname of the installation environment. You can set this parameter based on the actual situation. You can run the hostname command to query the hostname of the installation environment.
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Open the file.
- Run the following commands to modify the workers configuration file:
- Open the file.
vi workers
- Press i to enter the insert mode and edit the workers file to add the hostname of the installation environment.
armnode2
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Open the file.
- Run the following commands to create the master and slaves configuration files:
cd /path/to/HADOOP/hadoop-3.1.2/etc/hadoop cp workers master cp workers slaves