一、Hadoop SSH免密钥登陆
1. 在Master节点上:
1 2
| $ ssh-keygen -t dsa -P “” -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
|
2. 将Master节点上得密钥添加到当前节点上:
1
| $ cat ~/.ssh/master.pub >> ~/.ssh/authorized_keys
|
3. 将公共密钥和私密拷贝到Slaves节点上:
1 2
| $ scp ~/.ssh/* slave1:~/.ssh/ $ scp ~/.ssh/* slave2:~/.ssh/
|
二、安装JDK1.7
1. 下载JDK 路径:
1
| ~/deploy_env/jdk-7u55-linux-x64.tar.gz
|
2. 解压JDK:
1
| $ tar zxvf jdk-7u55-linux-x64.tar.gz
|
3. 安装JDK:
1) 在/usr/lib下创建java目录
1 2
| $ cd /usr/lib $ sudo mkdir java
|
2) 安装JDK
1
| $ sudo mv jdk1.7.0_55 /usr/lib/java
|
三、配置环境变量
1. 编辑~/.bashrc:
2. 加入JDK环境目录:
1 2 3 4 5 6 7 8 9
| export JAVA_HOME=/usr/lib/java/jdk1.7.0_55 export JRE_HOME=$JAVA_HOME/jre export HADOOP_HOME=/home/hadoop/hadoop-1.2.1 export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/conf export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$PATH export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
|
3. 编辑/etc/hosts:
4. 加入以下ip, hostname:
1 2 3 4 5 6 7 8 9 10 11
| 127.0.0.1 localhost 192.168.1.11 ubuntu-1 192.168.1.12 ubuntu-2 192.168.1.13 ubuntu-3 192.168.1.14 ubuntu-4 192.168.1.15 ubuntu-5 192.168.1.16 ubuntu-6 192.168.1.17 ubuntu-7 192.168.1.18 ubuntu-8 192.168.1.19 ubuntu-9 192.168.1.20 ubuntu-10
|
四、配置Hadoop环境
1. 先进入Hadoop配置目录:
1
| $ cd ~/hadoop-2.2.0/etc/hadoop/
|
1) hadoop-env.sh
1
| export JAVA_HOME=/usr/lib/java/jdk1.7.0_55
|
2) master
4) slaves
1 2 3 4 5 6 7 8 9 10
| ubuntu-1 ubuntu-2 ubuntu-3 ubuntu-4 ubuntu-5 ubuntu-6 ubuntu-7 ubuntu-8 ubuntu-9 ubuntu-10
|
5) core-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| <configuration> <property> <name>fs.default.name</name> <value>hdfs://ubuntu-1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/hdfs/data</value> </property> </configuration>
|
6) hdfs-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
| <configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>ubuntu-1:9001</value> </property> <property> <name>dfs.name.dir</name> <value>、home/hadoop/hdfs/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/hdfs/datanode</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.http.address</name> <value>ubuntu-1:50070</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
|
7) mapred-site.xml
1 2 3 4 5 6 7 8 9 10
| <configuration> <property> <name>mapred.job.tracker</name> <value>ubuntu-1:10101</value> </property> <property> <name>mapred.local.dir</name> <value>/home/hadoop/tmp</value> </property> </configuration>
|
五、把Hadoop和相关配置文件复制到各个节点
1. 创建bash脚本:
2. 把以下脚本复制到编辑器中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| ip_array=( "192.168.1.12" "192.168.1.13" "192.168.1.14" "192.168.1.15" "192.168.1.16" "192.168.1.17" "192.168.1.18" "192.168.1.19" "192.168.1.20") user="hadoop" copy_hosts="sudo mv ~/hosts /etc/" for ip in ${ip_array[*]} do scp /etc/hosts $user@$ip:~/ ssh -t $user@$ip $copy_hosts scp ~/.bashrc $user@$ip:~/ scp -r ~/hadoop-1.2.1 $user@$ip:/home/hadoop/ ssh $user@$ip $remote_cmd done
|
3. 执行脚本:
六、启动Hadoop集群
1. 启动集群
2. 测试集群是否正常
正常情况下在主节点上,jps中的进程应该包括: DataNode, TaskTracker, NameNode, SecondayNameNode, JobTracker
其他节点上jps中的进程应该包括:DataNode, TaskTracker
3. 测试WordCount程序
1) 新建一个文本
1
| echo "Hello World" > test_hadoop
|
2). 上传到HDFS
1
| hadoop fs -put ./test_hadoop /
|
3). 运行wordcount
1 2
| cd ~/hadoop-1.2.1/ hadoop jar hadoop-examples-1.2.1.jar wordcount /test_hadoop/ /test_output
|
[1] Hadoop官网文档 http://hadoop.apache.org/docs/r1.2.1/