- Installing Hadoop
- Configuring Hadoop
- SSH Localhost
- Running Hadoop
- Using Basic Hadoop Commands
$ brew install hadoop
Hadoop will be installed in the following directory /usr/local/Cellar/hadoop
The file can be located at /usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop/hadoop-env.sh where 2.8.0 is the hadoop version.
Find the line with
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
and change it to
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
The file can be located at /usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop/core-site.xml
1 2 3 4 5 6 7 8 9 10 11 <configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/Cellar/hadoop/hdfs/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
The file can be located at /usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop/mapred-site.xml and by default will be blank.
1 2 3 4 5 6 <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9010</value> </property> </configuration>
The file can be located at /usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop/hdfs-site.xml .
1 2 3 4 5 6 <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
To simplify life edit your ~/.profile using vim or your favorite editor and add the following two commands. By default ~/.profile might not exist. If it doesn’t work after restarting terminal, try to edit ~/.bash_profile.
alias hstart="/usr/local/Cellar/hadoop/2.8.0/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.8.0/sbin/start-yarn.sh" alias hstop="/usr/local/Cellar/hadoop/2.8.0/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/2.8.0/sbin/stop-dfs.sh" and execute
$ source ~/.profile
in the terminal to update.
Before we can run Hadoop we first need to format the HDFS using
$ hdfs namenode -format
Nothing needs to be done here if you have already generated ssh keys. To verify just check for the existance of ~/.ssh/id_rsa and the ~/.ssh/id_rsa.pub files. If not the keys can be generated using
$ ssh-keygen -t rsa
Enable Remote Login
“System Preferences” -> “Sharing”. Check “Remote Login”
Authorize SSH Keys
To allow your system to accept login, we have to make it aware of the keys that will be used
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Let’s try to login.
$ ssh localhost Last login: Fri Mar 6 20:30:53 2015 $ exit
Now we can run Hadoop just by typing
and stopping using
Using Basic Hadoop Commands
#Copying files from local to Hadoop File System $ hadoop fs -copyFromLocal test.txt #Listing files and folders $ hadoop fs -ls #Creating a directory named Directory1 $ hadoop fs -mkdir Directory1 #Moving test.txt to Directory1 $ hadoop fs -mv test.txt Directory1 #Running wordcount example for the files under Directory1 and write results under output directory $ hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount Directory1 output #View the output file $ hadoop fs -cat output/part-r-00000