一、Hadoop SSH免密钥登陆

1.  在Master节点上:

1
2
$ ssh-keygen -t dsa -P “” -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

2.  将Master节点上得密钥添加到当前节点上:

1
$ cat ~/.ssh/master.pub >> ~/.ssh/authorized_keys

3.  将公共密钥和私密拷贝到Slaves节点上:

1
2
$ scp ~/.ssh/* slave1:~/.ssh/
$ scp ~/.ssh/* slave2:~/.ssh/

二、安装JDK1.7

1.  下载JDK 路径:

1
~/deploy_env/jdk-7u55-linux-x64.tar.gz

2.  解压JDK:

1
$ tar zxvf jdk-7u55-linux-x64.tar.gz

3.  安装JDK:
    1) 在/usr/lib下创建java目录

1
2
$ cd /usr/lib
$ sudo mkdir java

    2) 安装JDK

1
$ sudo mv jdk1.7.0_55 /usr/lib/java

三、配置环境变量

1.  编辑~/.bashrc:

1
$ vi ~/.bashrc

2.  加入JDK环境目录:

1
2
3
4
5
6
7
8
9
export JAVA_HOME=/usr/lib/java/jdk1.7.0_55
export JRE_HOME=$JAVA_HOME/jre
export HADOOP_HOME=/home/hadoop/hadoop-1.2.1
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/conf
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$PATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

3.  编辑/etc/hosts:

1
$ sudo vi /etc/hosts

4.  加入以下ip, hostname:

1
2
3
4
5
6
7
8
9
10
11
127.0.0.1 localhost
192.168.1.11 ubuntu-1
192.168.1.12 ubuntu-2
192.168.1.13 ubuntu-3
192.168.1.14 ubuntu-4
192.168.1.15 ubuntu-5
192.168.1.16 ubuntu-6
192.168.1.17 ubuntu-7
192.168.1.18 ubuntu-8
192.168.1.19 ubuntu-9
192.168.1.20 ubuntu-10

四、配置Hadoop环境

1.  先进入Hadoop配置目录:

1
$ cd ~/hadoop-2.2.0/etc/hadoop/

    1) hadoop-env.sh

1
export JAVA_HOME=/usr/lib/java/jdk1.7.0_55

    2) master

1
ubuntu-1

    4) slaves

1
2
3
4
5
6
7
8
9
10
ubuntu-1
ubuntu-2
ubuntu-3
ubuntu-4
ubuntu-5
ubuntu-6
ubuntu-7
ubuntu-8
ubuntu-9
ubuntu-10

    5) core-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ubuntu-1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hdfs/data</value>
</property>
</configuration>

    6) hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>ubuntu-1:9001</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>、home/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hdfs/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.http.address</name>
<value>ubuntu-1:50070</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

    7) mapred-site.xml

1
2
3
4
5
6
7
8
9
10
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>ubuntu-1:10101</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>

五、把Hadoop和相关配置文件复制到各个节点

1.  创建bash脚本:

1
vi cp2all.sh

2.  把以下脚本复制到编辑器中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# !/bin/bash
ip_array=( "192.168.1.12" "192.168.1.13" "192.168.1.14" "192.168.1.15" "192.168.1.16" "192.168.1.17" "192.168.1.18" "192.168.1.19" "192.168.1.20")
user="hadoop"
copy_hosts="sudo mv ~/hosts /etc/"
for ip in ${ip_array[*]}
do
# Copy hosts
scp /etc/hosts $user@$ip:~/
ssh -t $user@$ip $copy_hosts
# Copy .bashrc
scp ~/.bashrc $user@$ip:~/
# Copy hadoop
scp -r ~/hadoop-1.2.1 $user@$ip:/home/hadoop/
ssh $user@$ip $remote_cmd
done

3.  执行脚本:

1
bash cp2all.sh

六、启动Hadoop集群

1.  启动集群

1
start-all.sh

2.  测试集群是否正常

1
jps

正常情况下在主节点上,jps中的进程应该包括: DataNode, TaskTracker, NameNode, SecondayNameNode, JobTracker
其他节点上jps中的进程应该包括:DataNode, TaskTracker

3.  测试WordCount程序
    1) 新建一个文本

1
echo "Hello World" > test_hadoop

    2). 上传到HDFS

1
hadoop fs -put ./test_hadoop /

    3). 运行wordcount

1
2
cd ~/hadoop-1.2.1/
hadoop jar hadoop-examples-1.2.1.jar wordcount /test_hadoop/ /test_output


[1] Hadoop官网文档 http://hadoop.apache.org/docs/r1.2.1/