References
Prerequisites
- Homebrew
Installing Hadoop
Hadoop will be installed in the following directory /usr/local/Cellar/hadoop
Configuring Hadoop
Edit hadoop-env.sh
The file can be located at /usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop/hadoop-env.sh
where 2.8.0 is the hadoop version.
Find the line with
1
| export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
|
and change it to
1
| export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
|
Edit Core-site.xml
The file can be located at /usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop/core-site.xml
1
2
3
4
5
6
7
8
9
10
11
| <configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
|
Edit mapred-site.xml
The file can be located at /usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop/mapred-site.xml and by default will be blank.
1
2
3
4
5
6
| <configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>
|
Edit hdfs-site.xml
The file can be located at /usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop/hdfs-site.xml .
1
2
3
4
5
6
| <configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
|
To simplify life edit your ~/.profile using vim or your favorite editor and add the following two commands. By default ~/.profile might not exist. If it doesn’t work after restarting terminal, try to edit ~/.bash_profile.
1
2
3
| alias hstart="/usr/local/Cellar/hadoop/2.8.0/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.8.0/sbin/start-yarn.sh"
alias hstop="/usr/local/Cellar/hadoop/2.8.0/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/2.8.0/sbin/stop-dfs.sh"
and execute
|
in the terminal to update.
Before we can run Hadoop we first need to format the HDFS using
1
| $ hdfs namenode -format
|
SSH Localhost
Nothing needs to be done here if you have already generated ssh keys. To verify just check for the existance of ~/.ssh/id_rsa and the ~/.ssh/id_rsa.pub files. If not the keys can be generated using
Enable Remote Login
“System Preferences” -> “Sharing”. Check “Remote Login”
Authorize SSH Keys
To allow your system to accept login, we have to make it aware of the keys that will be used
1
| $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
|
Let’s try to login.
1
2
3
| $ ssh localhost
Last login: Fri Mar 6 20:30:53 2015
$ exit
|
Running Hadoop
Now we can run Hadoop just by typing
and stopping using
Using Basic Hadoop Commands
1
2
3
4
5
6
7
8
9
10
11
12
| #Copying files from local to Hadoop File System
$ hadoop fs -copyFromLocal test.txt
#Listing files and folders
$ hadoop fs -ls
#Creating a directory named Directory1
$ hadoop fs -mkdir Directory1
#Moving test.txt to Directory1
$ hadoop fs -mv test.txt Directory1
#Running wordcount example for the files under Directory1 and write results under output directory
$ hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount Directory1 output
#View the output file
$ hadoop fs -cat output/part-r-00000
|