Installing Hadoop on Ubuntu

Hadoop is developed in Java and aim to run mainly on Linux.
In this tutorial I will demonstrate how to install and run Hadoop on Ubuntu 12.04. The setup will run Hadoop in a single node cluster.

As a requirement Java JDK 6 need to be install:
$ sudo apt-get install openjdk-6-jdk

I – Activate SSH without password

Hadoop master node remotely control it sub-nodes using SSH.
In single node cluster, master and sub-nodes run on the same machine but Hadoop is not aware of that.
It will still use the exact same way to communicate between them using SSH.

Install SSH server:
$ sudo apt-get install openssh-server

Create a ssh-key without password:
$ cd ~
$ ssh-keygen -t rsa -P “”

Set the key as trusted key for remote login:
$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys

Try to connect on localhost and accept the connection (mandatory)
$ ssh localhost



ssh key + login

II – Install Hadoop

Download Hadoop:
$ wget http://mirrors.ibiblio.org/apache/hadoop/common/hadoop-1.2.1/hadoop-1.2.1-bin.tar.gz

Extract:
$ tar xvf hadoop-1.2.1-bin.tar.gz hadoop

Update the $PATH to use Hadoop from the command line:
$ vim.tiny ~/.bashrc

Add at the end of the file:

export HADOOP_HOME=~/hadoop
export PATH=$PATH:$HADOOP_HOME

Close and re-open your console so the $PATH get updated.


III – Configure Hadoop for single-node

Locate the path where Java JDK 6 is installed.
On Ubuntu 12.04 it is in /usr/lib/jvm/java-6-openjdk-amd64/

$ vim.tiny ~/hadoop/conf/hadoop-env.sh

find and replace:
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
by:
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64/

remove the # to uncomment the line

$ vim.tiny ~/hadoop/conf/mapred-site.xml

[...]
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
  </property>
</configuration>

$ vim.tiny ~/hadoop/conf/hdfs-site.xml

[...]
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>


$ vim.tiny ~/hadoop/conf/core-site.xml

[...]
<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>~/hadoop-hdfs</value>
  </property>

  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:54310</value>
  </property>
</configuration>


Hadoop will store it data under hadoop-hdfs directory.
Create the directory:
$ mkdir ~/hadoop-hdfs

Finally format the hdfs
$ hadoop namenode -format

VI – Start the cluster

Run the command
$ start-all.sh

This will startup Namenode DataNode JobTracker and TaskTracker on your machine.

JobTracker UI run at http://localhost:50030

Hadoop JobTracker

Hadoop JobTracker


TaskTracker UI run at http://localhost:50080

Hadoop TaskTracker

Hadoop TaskTracker