Installation steps of Hadoop in ubuntu.
Hi Guys,
In this topic I am going to discuss about how to install and configure the hadoop in ubuntu and setup the HDFS in ubuntu.
Step 1:
First we need to install java in ubuntu. For installing java in ubuntu copy this command:
Sudo apt-get update
Sudo apt-get install default-jre
Sudo apt-get install default-jdk
Step 2:
After installing the java we need to set java path in bashrc file.
To open bashrc file command is: gedit ~/.bashrc
To know the java path command is $JAVA_HOME
To set java path in bashrc file command is:
Export JAVA_HOME = java_path (ex:/usr/lib/jvm/java-7-openjdk-amd64)
Step 3:
We need to install hadoop secure shell for hadoop and to install that command is:
sudo apt-get install ssh
Step 4:
We need to download the hadoop file from apache hadoop website and extract that file to /usr/local directory and rename the folder to hadoop
Here is the link for downloading the apache hadoop from offical website apache hadoop. We can download any version based on our requirement.
https://hadoop.apache.org/releases.html
We need to set few commands in bashrc file for using the hadoop those commands are:
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_INSTALL/lib/native”
Step 5:
We need to create datanode,namenode and also tmp file for storing and processing the data. Commands are
sudo mkdir -p /usr/local/hadoopdata/hdfs/namenode
sudo mkdir -p /usr/local/hadoopdata/hdfs/datanode
sudo mkdir -p /app/hadoop/tmp
We need to change directory permissions of /app and command for that changing permission is : sudo chmod 777 -R /app
Step 6:
We need to modify 4 files in hadoop directory for installation and configuration of hadoop and those files are located in /usr/local/hadoop/etc/hadoop.
The files that we need to modify are
1.core-site.xml
2.hadoop-env.sh
3.mapred-site.xml.template
4.hdfs-site.xml
Copy the following code in between configuration tags.
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
export JAVA_HOME=java_path(ex:/usr/lib/jvm/java-7-openjdk-amd64)
We need to rename the Mapred-site.xml. template to Mapred-site.xml or we can leave it as it is there will be no problem.
Mapred-site.xml. template:
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>file:/usr/local/hadoopdata/hdfs/datanode</value>
</property>
Step 7:
After the modification of all files we need to format the namenode before starting the Hadoop. The command is : hadoop namenode -format
For starting the hadoop command is : start-all.sh
After starting the hadoop we need to check all the 6 nodes are started or not. Those are
1.Namenode
2.Resource Manager
3.JPS
4.Secondary Namenode
5.Datanode
6.Node Manager
If all the 6 nodes are displayed then your hadoop is ready.
If datanode is not present first stop all nodes and delete the files in tmp directory and again start the hadoop. For stopping the hadoop the command is : stop-all.sh
If we want to view hadoop namenodes in browser the url will be localhost:9876 in hadoop version 3.3.1 but in previous version the url will be localhost:50070
Anyone can connect to me by using Linkedin |Twitter |youtube|Instagram
Hope You guys like the content. If anyone have any questions please feel free to ask in the comment section or contact me in the above links.