Hadoop
May 10, 2017

References

Prerequisites

  1. Homebrew

Installing Hadoop

1
$ brew install hadoop

Hadoop will be installed in the following directory /usr/local/Cellar/hadoop

Configuring Hadoop

Edit hadoop-env.sh

The file can be located at /usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop/hadoop-env.sh where 2.8.0 is the hadoop version.

Find the line with

1
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

and change it to

1
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="

Edit Core-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop/core-site.xml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
    <description>A base for other temporary directories.</description>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    </property>
</configuration>

Edit mapred-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop/mapred-site.xml and by default will be blank.

1
2
3
4
5
6
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9010</value>
  </property>
</configuration>

Edit hdfs-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop/hdfs-site.xml .

1
2
3
4
5
6
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

To simplify life edit your ~/.profile using vim or your favorite editor and add the following two commands. By default ~/.profile might not exist. If it doesn’t work after restarting terminal, try to edit ~/.bash_profile.

1
2
3
alias hstart="/usr/local/Cellar/hadoop/2.8.0/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.8.0/sbin/start-yarn.sh"
alias hstop="/usr/local/Cellar/hadoop/2.8.0/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/2.8.0/sbin/stop-dfs.sh"
and execute
1
$ source ~/.profile

in the terminal to update.

Before we can run Hadoop we first need to format the HDFS using

1
$ hdfs namenode -format

SSH Localhost

Nothing needs to be done here if you have already generated ssh keys. To verify just check for the existance of ~/.ssh/id_rsa and the ~/.ssh/id_rsa.pub files. If not the keys can be generated using

1
$ ssh-keygen -t rsa

Enable Remote Login

“System Preferences” -> “Sharing”. Check “Remote Login”

Authorize SSH Keys

To allow your system to accept login, we have to make it aware of the keys that will be used

1
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Let’s try to login.

1
2
3
$ ssh localhost
Last login: Fri Mar  6 20:30:53 2015
$ exit

Running Hadoop

Now we can run Hadoop just by typing

1
$ hstart

and stopping using

1
$ hstop

Using Basic Hadoop Commands

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#Copying files from local to Hadoop File System
$ hadoop fs -copyFromLocal test.txt
#Listing files and folders
$ hadoop fs -ls
#Creating a directory named Directory1
$ hadoop fs -mkdir Directory1
#Moving test.txt to Directory1
$ hadoop fs -mv test.txt Directory1
#Running wordcount example for the files under Directory1 and write results under output directory
$ hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount Directory1 output
#View the output file
$ hadoop fs -cat output/part-r-00000