首页 > 网络 > 云计算 >

ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1伪分布式环境部署

2016-08-04

ubuntu12 04+hadoop2 2 0+zookeeper3 4 5+hbase0 96 2+hive0 13 1伪分布式环境部署

1. 这些软件在哪里下载?

hadoop2.2.0:http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz

zookeeper3.4.5:http://apache.dataguru.cn/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz

hbase0.96.2:http://mirrors.hust.edu.cn/apache/hbase/hbase-0.96.2/hbase-0.96.2-hadoop2-bin.tar.gz

hive0.13.1:http://mirrors.cnnic.cn/apache/hive/hive-0.13.1/apache-hive-0.13.1-bin.tar.gz

JDK1.7.0_65:使用apt-get方式安装

这里hadoop2.2.0使用的是源码包,因为我使用的是64bit的ubuntu,而hadoop官方提供的,只有32bit可用。如果在64bit上运行会报错util.NativeCodeLoader - Unable to load native-hadoop library for your platform..错误,所以需要重新在64bit上编辑,后面我会单独写一篇文章介绍如何编译64bit的hadoop。

2. 如何安装

2.1 安装JDK(当前主机名为m1)

1)执行以下命令

root@m1:/home/hadoop# sudo apt-get install oracle-java7-installer

2)配置JAVA环境变量

root@m1:/home/hadoop# sudo vi /etc/environment

在第一行的PASH最后加上java的bin路径。

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/java-7oracle/bin”

在PATH的后面加上下面三行

CLASSPATH="/usr/lib/jvm/java-7-oracle/lib”

JAVA_HOME="/usr/lib/jvm/java-7-oracle”

JRE_HOME="/usr/lib/jvm/java-7-oracle/jre”

告诉系统,我们使用的sun的JDK,而非OpenJDK了

root@m1:/home/hadoop# sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/java-7-oracle/bin/java 300

root@m1:/home/hadoop# sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/java-7-oracle/bin/javac 300

root@m1:/home/hadoop# sudo update-alternatives --config java

这时会有几个选项,如选择2,然后再执行java -version就可以看到最新版本

2.2 用parallels克隆3台机器

1)在parallels的硬件网络中选择如下所示,这个时候这个ping www.163.com就会ping通了

2)点击Parallels左上角=》文件=》克隆,克隆三台虚拟机名字分别命名为:m2,s1,s2(克隆前要先停止虚拟机)

执行sudo vi /etc/hostname ,修改各自的主机名称,如果生效需要重启。

在m1、m2、s1、s2上分别执行ifconfig查看被分配到的IP地址,然后执行sudo vi /etc/hosts,然后执行”sudo /etc/init.d/networking restart”生效

3)配置shhd无验证登录(我使用的是root帐号)

安装SSH工具,(如果默认执行ssh存在,就不用安装了)

root@m1:/home/hadoop# sudo apt-get install ssh openssh-server

在每台机器分别输入ssh-keygen,一路回车,然后会在用户的.ssh目录生成id_rsa和id_rsa.pub文件。

在m1上执行:

root@m1:/home/hadoop# scp -r root@m2:/root/.ssh/id_rsa.pub ~/.ssh/m2.pub

root@m1:/home/hadoop# scp -r root@s1:/root/.ssh/id_rsa.pub ~/.ssh/s1.pub

root@m1:/home/hadoop# scp -r root@s2:/root/.ssh/id_rsa.pub ~/.ssh/s2.pub

root@m1:/home/hadoop# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

root@m1:/home/hadoop# cat ~/.ssh/m2.pub >> ~/.ssh/authorized_keys

root@m1:/home/hadoop# cat ~/.ssh/s1.pub >> ~/.ssh/authorized_keys

root@m1:/home/hadoop# cat ~/.ssh/s2.pub >> ~/.ssh/authorized_keys

root@m1:/home/hadoop# scp -r ~/.ssh/authorized_keys root@m2:~/.ssh/

root@m1:/home/hadoop# scp -r ~/.ssh/authorized_keys root@s1:~/.ssh/

root@m1:/home/hadoop# scp -r ~/.ssh/authorized_keys root@s2:~/.ssh/

2.3 安装Zookeeper-3.4.5

1)配置zoo.cfg(默认是没有zoo.cfg,将zoo_sample.cfg复制一份,并命名为zoo.cfg)

root@m1:/home/hadoop/zookeeper-3.4.5/conf# vi zoo.cfg

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir=/home/hadoop/zookeeper-3.4.5/data

dataLogDir=/home/hadoop/zookeeper-3.4.5/logs

server.1=m1:2888:3888

server.2=m2:2888:3888

server.3=s1:2888:3888

server.4=s2:2888:3888

# the port at which the clients will connect

clientPort=2181

#

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

#

# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

#

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

#autopurge.purgeInterval=1

2)将zookeeper从m1复制到m2,s1,s2机器上

root@m1:/home/hadoop/zookeeper-3.4.5/conf# scp -r /home/hadoop/zookeeper-3.4.5 root@m2:/home/hadoop

root@m1:/home/hadoop/zookeeper-3.4.5/conf# scp -r /home/hadoop/zookeeper-3.4.5 root@s1:/home/hadoop

root@m1:/home/hadoop/zookeeper-3.4.5/conf# scp -r /home/hadoop/zookeeper-3.4.5 root@s2:/home/hadoop

3)在m1,m2,s1,s2机器上,的/home/hadoop/zookeeper-3.4.5/dataDir目录下创建 myid文件,内容为在zoo.cfg中配置的server.后面的数字,记住只能是数字

m1为1

m2为2

s1为3

s2为4

至此,zookeeper的配置结束。

2.4 安装hadoop2.2.0

修改以下7个配置文件:

1)/home/hadoop/hadoop-2.2.0/etc/hadoop/hadoop-env.sh(主要修改java路径)

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

#export JAVA_HOME=${JAVA_HOME}

2)/home/hadoop/hadoop-2.2.0/etc/hadoop/yarn-env.sh(主要修改java路径)

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi yarn-env.sh

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements. See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License. You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# User for YARN daemons

export HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn}

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

3)/home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi hdfs-site.xml

dfs.nameservices

mycluster

dfs.ha.namenodes.mycluster

m1,m2

dfs.namenode.rpc-address.mycluster.m1

m1:9000

dfs.namenode.rpc-address.mycluster.m2

m2:9000

dfs.namenode.http-address.mycluster.m1

m1:50070

dfs.namenode.http-address.mycluster.m2

m2:50070

dfs.namenode.shared.edits.dir

qjournal://m1:8485;m2:8485/mycluster

dfs.ha.automatic-failover.enabled.mycluster

true

dfs.client.failover.proxy.provider.mycluster

org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

dfs.ha.fencing.methods

sshfence

dfs.ha.fencing.ssh.private-key-files

/root/.ssh/id_rsa

dfs.journalnode.edits.dir

/home/hadoop/hadoop-2.2.0/tmp/journal

dfs.replication

3

dfs.webhdfs.enabled

true

dfs.permissions

false

dfs.permissions.enabled

false

4)/home/hadoop/hadoop-2.2.0/etc/hadoop/mapred-site.xml

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi mapred-site.xml

mapreduce.framework.name

yarn

Execution framework set to Hadoop YARN.

5)/home/hadoop/hadoop-2.2.0/etc/hadoop/core-site.xml

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi core-site.xml

fs.defaultFS

hdfs://mycluster

dfs.nameservices

mycluster

ha.zookeeper.quorum

m1:2181,m2:2181,s1:2181,s2:2181

hadoop.tmp.dir

/home/hadoop/hadoop-2.2.0/tmp

6)/home/hadoop/hadoop-2.2.0/etc/hadoop/yarn-site.xml

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi yarn-site.xml

yarn.nodemanager.aux-services

mapreduce_shuffle

yarn.nodemanager.aux-services.mapreduce.shuffle.class

org.apache.hadoop.mapred.ShuffleHandler

yarn.resourcemanager.hostname

m1

7)/home/hadoop/hadoop-2.2.0/etc/hadoop/slaves

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi slaves

s1

s2

至此,hadoop的配置结束

3. 如何使用

3.1 启动zookeeper

1)在m1,m2,s1,s2所有机器上执行,下面的代码是在m1上执行的示例:

root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh start

JMX enabled by default

Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh status

JMX enabled by default

Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg

Mode: follower

root@m1:/home/hadoop#

2)在每台机器上执行下面的命令,可以查看状态,在s1上是leader,其他机器是follower

root@s1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh start

JMX enabled by default

Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

root@s1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh status

JMX enabled by default

Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg

Mode: leader

root@s1:/home/hadoop#

3)测试zookeeper是否启动成功,看下面第29行高亮处,表示成功。

root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkCli.sh

Connecting to localhost:2181

2014-07-27 00:27:16,621 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT

2014-07-27 00:27:16,628 [myid:] - INFO [main:Environment@100] - Client environment:host.name=m1

2014-07-27 00:27:16,628 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_65

2014-07-27 00:27:16,629 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation

2014-07-27 00:27:16,629 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre

2014-07-27 00:27:16,630 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/home/hadoop/zookeeper-3.4.5/bin/../build/classes:/home/hadoop/zookeeper-3.4.5/bin/../build/lib/*.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/home/hadoop/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/home/hadoop/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/home/hadoop/zookeeper-3.4.5/bin/../conf:/usr/lib/jvm/java-7-oracle/lib

2014-07-27 00:27:16,630 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=:/usr/local/lib:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib

2014-07-27 00:27:16,631 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp

2014-07-27 00:27:16,631 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=

2014-07-27 00:27:16,632 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux

2014-07-27 00:27:16,632 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64

2014-07-27 00:27:16,632 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.11.0-15-generic

2014-07-27 00:27:16,633 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root

2014-07-27 00:27:16,633 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root

2014-07-27 00:27:16,634 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/home/hadoop

2014-07-27 00:27:16,636 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@19b1ebe5

Welcome to ZooKeeper!

2014-07-27 00:27:16,672 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@966] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

2014-07-27 00:27:16,685 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@849] - Socket connection established to localhost/127.0.0.1:2181, initiating session

JLine support is enabled

2014-07-27 00:27:16,719 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1207] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x147737cd5d30000, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null

[zk: localhost:2181(CONNECTED) 0] ls /

[zookeeper]

[zk: localhost:2181(CONNECTED) 1]

4)在m1上格式化zookeeper,第33行的日志表示创建成功。

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hdfs zkfc -formatZK

14/07/27 00:31:59 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at m1/192.168.1.50:9000

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:host.name=m1

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_65

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/hadoop/hadoop-2.2.0/etc/hadoop:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-codec-1.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/hadoop-annotations-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-net-3.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-math-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-lang-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/servlet-api-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/hadoop-auth-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-digester-1.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/xmlenc-0.52.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-httpclient-3.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jsch-0.1.42.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-jaxrs-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/junit-4.8.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jetty-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-xc-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jersey-json-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/stax-api-1.0.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jets3t-0.6.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-el-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-configuration-1.6.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/activation-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/zookeeper-3.4.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/xz-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-nfs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0-tests.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-lang-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-el-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-nfs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-2.2.0-tests.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/hamcrest-core-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/hadoop-annotations-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/guice-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/junit-4.10.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/xz-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-client-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-tests-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-api-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-site-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/hamcrest-core-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/hadoop-annotations-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/guice-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/junit-4.10.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/javax.inject-1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/xz-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/hadoop/hadoop-2.2.0/contrib/capacity-scheduler/*.jar

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/hadoop-2.2.0/lib/native

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.compiler=

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:os.version=3.11.0-15-generic

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:user.name=root

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:user.home=/root

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop

14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=m1:2181,m2:2181,s1:2181,s2:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5990054a

14/07/27 00:32:00 INFO zookeeper.ClientCnxn: Opening socket connection to server m1/192.168.1.50:2181. Will not attempt to authenticate using SASL (unknown error)

14/07/27 00:32:00 INFO zookeeper.ClientCnxn: Socket connection established to m1/192.168.1.50:2181, initiating session

14/07/27 00:32:00 INFO zookeeper.ClientCnxn: Session establishment complete on server m1/192.168.1.50:2181, sessionid = 0x147737cd5d30001, negotiated timeout = 5000

===============================================

The configured parent znode /hadoop-ha/mycluster already exists.

Are you sure you want to clear all failover information from

ZooKeeper?

WARNING: Before proceeding, ensure that all HDFS services and

failover controllers are stopped!

===============================================

Proceed formatting /hadoop-ha/mycluster? (Y or N) 14/07/27 00:32:00 INFO ha.ActiveStandbyElector: Session connected.

y

14/07/27 00:32:13 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/mycluster from ZK...

14/07/27 00:32:13 INFO ha.ActiveStandbyElector: Successfully deleted /hadoop-ha/mycluster from ZK.

14/07/27 00:32:13 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.

14/07/27 00:32:13 INFO zookeeper.ClientCnxn: EventThread shut down

14/07/27 00:32:13 INFO zookeeper.ZooKeeper: Session: 0x147737cd5d30001 closed

root@m1:/home/hadoop#

5)验证zkfc是否格式化成功,如果多了一个hadoop-ha包就是成功了。

root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkCli.sh

[zk: localhost:2181(CONNECTED) 0] ls /

[hadoop-ha, zookeeper]

[zk: localhost:2181(CONNECTED) 1]

3.2 启动JournalNode集群

1)依次在m1,m2,s1,s2上面执行

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start journalnode

starting journalnode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-journalnode-m1.out

root@m1:/home/hadoop# jps

2884 JournalNode

2553 QuorumPeerMain

2922 Jps

root@m1:/home/hadoop#

2)格式化集群的一个NameNode(m1),有两种方法,我使用的是第一种

方法一:

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hdfs namenode –format

方法二:

root@m1:/home/hadoop/hadoop-2.2.0/bin/hdfs namenode -format -clusterId m1

3)在m1上启动刚才格式化的 namenode

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode

执行命令后,浏览:http://m1:50070/dfshealth.jsp可以看到m1的状态

4)在m2机器上,将m1的数据复制到m2上来,在m2上执行

root@m2:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hdfs namenode –bootstrapStandby

5)启动m2上的namenode,执行命令后

root@m2:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode

浏览:http://m2:50070/dfshealth.jsp可以看到m2的状态。这个时候在网址上可以发现m1和m2的状态都是standby。

6)启动所有的datanode,在m1上执行

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemons.sh start datanode

s2: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-datanode-s2.out

s1: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-datanode-s1.out

root@m1:/home/hadoop#

7)启动yarn,在m1上执行以下命令

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/start-yarn.sh

starting yarn daemons

starting resourcemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-root-resourcemanager-m1.out

s1: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-root-nodemanager-s1.out

s2: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-root-nodemanager-s2.out

root@m1:/home/hadoop#

然后浏览:http://m1:8088/cluster, 可以看到效果

8)、启动 ZooKeeperFailoverCotroller,在m1,m2机器上依次执行以下命令,这个时候再浏览50070端口,可以发现m1变成active状态了,而m2还是standby状态

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start zkfc

starting zkfc, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-zkfc-m1.out

root@m1:/home/hadoop#

9)、测试HDFS是否可用

root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /

Found 2 items

drwx------ - root supergroup 0 2014-07-17 23:54 /tmp

drwxr-xr-x - lion supergroup 0 2014-07-21 00:40 /user

root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -mkdir /input

root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /

Found 3 items

drwxr-xr-x - root supergroup 0 2014-07-27 01:20 /input

drwx------ - root supergroup 0 2014-07-17 23:54 /tmp

drwxr-xr-x - lion supergroup 0 2014-07-21 00:40 /user

root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /input

root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -put hadoop.cmd /input

root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /input

Found 1 items

-rw-r--r-- 3 root supergroup 7530 2014-07-27 01:20 /input/hadoop.cmd

root@m1:/home/hadoop/hadoop-2.2.0/bin#

10)、测试YARN是否可用,我们来做一个经典的例子,统计刚才放入input下面的hadoop.cmd的单词频率

root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hadoop jar /home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output

14/07/27 01:22:41 INFO client.RMProxy: Connecting to ResourceManager at m1/192.168.1.50:8032

14/07/27 01:22:43 INFO input.FileInputFormat: Total input paths to process : 1

14/07/27 01:22:44 INFO mapreduce.JobSubmitter: number of splits:1

14/07/27 01:22:44 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name

14/07/27 01:22:44 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

14/07/27 01:22:44 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class

14/07/27 01:22:44 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class

14/07/27 01:22:44 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class

14/07/27 01:22:44 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name

14/07/27 01:22:44 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class

14/07/27 01:22:44 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir

14/07/27 01:22:44 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir

14/07/27 01:22:44 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

14/07/27 01:22:44 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class

14/07/27 01:22:44 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir

14/07/27 01:22:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406394452186_0001

14/07/27 01:22:46 INFO impl.YarnClientImpl: Submitted application application_1406394452186_0001 to ResourceManager at m1/192.168.1.50:8032

14/07/27 01:22:46 INFO mapreduce.Job: The url to track the job: http://m1:8088/proxy/application_1406394452186_0001/

14/07/27 01:22:46 INFO mapreduce.Job: Running job: job_1406394452186_0001

14/07/27 01:23:10 INFO mapreduce.Job: Job job_1406394452186_0001 running in uber mode : false

14/07/27 01:23:10 INFO mapreduce.Job: map 0% reduce 0%

14/07/27 01:23:31 INFO mapreduce.Job: map 100% reduce 0%

14/07/27 01:23:48 INFO mapreduce.Job: map 100% reduce 100%

14/07/27 01:23:48 INFO mapreduce.Job: Job job_1406394452186_0001 completed successfully

14/07/27 01:23:49 INFO mapreduce.Job: Counters: 43

File System Counters

FILE: Number of bytes read=6574

FILE: Number of bytes written=175057

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=7628

HDFS: Number of bytes written=5088

HDFS: Number of read operations=6

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Launched reduce tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=18062

Total time spent by all reduces in occupied slots (ms)=14807

Map-Reduce Framework

Map input records=240

Map output records=827

Map output bytes=9965

Map output materialized bytes=6574

Input split bytes=98

Combine input records=827

Combine output records=373

Reduce input groups=373

Reduce shuffle bytes=6574

Reduce input records=373

Reduce output records=373

Spilled Records=746

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=335

CPU time spent (ms)=2960

Physical memory (bytes) snapshot=270057472

Virtual memory (bytes) snapshot=1990762496

Total committed heap usage (bytes)=136450048

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=7530

File Output Format Counters

Bytes Written=5088

root@m1:/home/hadoop/hadoop-2.2.0/bin#

11)、验证HA的高可用性,故障转移,刚才我们用浏览器打开m1和m2的50070端口,已经看到m1的状态是active,m2的状态是standby

a)我们在m1上kill掉namenode进程

root@m1:/home/hadoop/hadoop-2.2.0/bin# jps

5492 Jps

2884 JournalNode

4375 DFSZKFailoverController

2553 QuorumPeerMain

3898 NameNode

4075 ResourceManager

root@m1:/home/hadoop/hadoop-2.2.0/bin# kill -9 3898

root@m1:/home/hadoop/hadoop-2.2.0/bin# jps

2884 JournalNode

4375 DFSZKFailoverController

2553 QuorumPeerMain

4075 ResourceManager

5627 Jps

root@m1:/home/hadoop/hadoop-2.2.0/bin#

b)再浏览m1和m2的50070端口,发现m1是打不开,而m2是active状态。

这时候在m2上的HDFS和mapreduce还是可以正常运行的,虽然m1上的namenode进程已经被kill掉,但不影响使用这就是故障转移的优势!

3.3 Hbase-0.96.2-hadoop2

启动双HMaster的配置,m1是主HMaster,m2是从HMaster

1)、修改hbase-env.sh配置,主要修JAVA_HOME的目录,以及HBASE_MANAGES_ZK

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# vi hbase-env.sh

#

#/**

# * Copyright 2007 The Apache Software Foundation

# *

# * Licensed to the Apache Software Foundation (ASF) under one

# * or more contributor license agreements. See the NOTICE file

# * distributed with this work for additional information

# * regarding copyright ownership. The ASF licenses this file

# * to you under the Apache License, Version 2.0 (the

# * "License"); you may not use this file except in compliance

# * with the License. You may obtain a copy of the License at

# *

# * http://www.apache.org/licenses/LICENSE-2.0

# *

# * Unless required by applicable law or agreed to in writing, software

# * distributed under the License is distributed on an "AS IS" BASIS,

# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# * See the License for the specific language governing permissions and

# * limitations under the License.

# */

# Set environment variables here.

# This script sets variables multiple times over the course of starting an hbase process,

# so try to keep things idempotent unless you want to take an even deeper look

# into the startup scripts (bin/hbase, etc.)

# The java implementation to use. Java 1.6 required.

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

# Extra Java CLASSPATH elements. Optional.

# export HBASE_CLASSPATH=

# The maximum amount of heap to use, in MB. Default is 1000.

# export HBASE_HEAPSIZE=1000

# Extra Java runtime options.

# Below are what we set by default. May only work with SUN JVM.

# For more on why as well as other possible settings,

# see http://wiki.apache.org/hadoop/PerformanceTuning

export HBASE_OPTS="-XX:+UseConcMarkSweepGC"

# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.

# This enables basic gc logging to the .out file.

# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.

# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .

# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.

# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .

# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc: -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment one of the below three options to enable java garbage collection logging for the client processes.

# This enables basic gc logging to the .out file.

# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.

# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .

# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.

# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .

# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc: -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment below if you intend to use the EXPERIMENTAL off heap cache.

# export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize="

# Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value.

# Uncomment and adjust to enable JMX exporting

# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.

# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html

#

# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"

# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"

# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"

# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"

# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"

# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"

# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.

# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident

#HBASE_REGIONSERVER_MLOCK=true

#HBASE_REGIONSERVER_UID="hbase"

# File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default.

# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

# Extra ssh options. Empty by default.

# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

# Where log files are stored. $HBASE_HOME/logs by default.

# export HBASE_LOG_DIR=${HBASE_HOME}/logs

# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers

# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"

# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"

# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"

# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"

# A string representing this instance of hbase. $USER by default.

# export HBASE_IDENT_STRING=$USER

# The scheduling priority for daemon processes. See 'man nice'.

# export HBASE_NICENESS=10

# The directory where pid files are stored. /tmp by default.

# export HBASE_PID_DIR=/var/hadoop/pids

# Seconds to sleep between slave commands. Unset by default. This

# can be useful in large clusters, where, e.g., slave rsyncs can

# otherwise arrive faster than the master can service them.

# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it's own instance of Zookeeper or not.

export HBASE_MANAGES_ZK=false

#这个值为false时,表示启动的是独立的zookeeper。而配置成true则是hbase自带的zookeeper。

# The default log rolling policy is RFA, where the log file is rolled as per the size defined for the

# RFA appender. Please refer to the log4j.properties file to see more details on this appender.

# In case one needs to do log rolling on a date change, one should set the environment property

# HBASE_ROOT_LOGGER to ",DRFA".

# For example:

# HBASE_ROOT_LOGGER=INFO,DRFA

# The reason for changing default to RFA is to avoid the boundary case of filling out disk space as

# DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.

2)、修改hbase-site.xml配置

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# vi hbase-site.xml

hbase.rootdir

hdfs://mycluster/hbase

hbase.cluster.distributed

true

hbase.tmp.dir

/home/hadoop/hbase-0.96.2-hadoop2/tmp

hbase.master

60000

hbase.zookeeper.quorum

m1,m2,s1,s2

hbase.zookeeper.property.clientPort

2181

hbase.zookeeper.property.dataDir

/home/hadoop/zookeeper-3.4.5/data

2)、修改regionservers文件

通常部署master的机器上不就部署slave了,用两台集群做Hbase从服务器

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# vi regionservers

s1

s2

3)、创建hadoop的hdfs-site.xml的软连接到hbase的配置文件目录

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# ll

总用量 40

drwxr-xr-x 2 root root 4096 Jul 27 09:15 ./

drwxr-xr-x 9 root root 4096 Jul 20 21:40 ../

-rw-r--r-- 1 root staff 1026 Mar 25 06:29 hadoop-metrics2-hbase.properties

-rw-r--r-- 1 root staff 4023 Mar 25 06:29 hbase-env.cmd

-rw-r--r-- 1 root staff 7129 Jul 27 08:58 hbase-env.sh

-rw-r--r-- 1 root staff 2257 Mar 25 06:29 hbase-policy.xml

-rw-r--r-- 1 root staff 2550 Jul 27 09:10 hbase-site.xml

-rw-r--r-- 1 root staff 3451 Mar 25 06:29 log4j.properties

-rw-r--r-- 1 root staff 6 Jul 20 21:38 regionservers

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# ln -s /home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml hdfs-site.xml

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# ll

总用量 40

drwxr-xr-x 2 root root 4096 Jul 27 09:16 ./

drwxr-xr-x 9 root root 4096 Jul 20 21:40 ../

-rw-r--r-- 1 root staff 1026 Mar 25 06:29 hadoop-metrics2-hbase.properties

-rw-r--r-- 1 root staff 4023 Mar 25 06:29 hbase-env.cmd

-rw-r--r-- 1 root staff 7129 Jul 27 08:58 hbase-env.sh

-rw-r--r-- 1 root staff 2257 Mar 25 06:29 hbase-policy.xml

-rw-r--r-- 1 root staff 2550 Jul 27 09:10 hbase-site.xml

lrwxrwxrwx 1 root root 50 Jul 27 09:16 hdfs-site.xml -> /home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml*

-rw-r--r-- 1 root staff 3451 Mar 25 06:29 log4j.properties

-rw-r--r-- 1 root staff 6 Jul 20 21:38 regionservers

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf#

3)、hbase0.96.2版本的jar包不需要复制,官方提供的是已经打包好的

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# ls | grep hadoop

hadoop-annotations-2.2.0.jar

hadoop-auth-2.2.0.jar

hadoop-client-2.2.0.jar

hadoop-common-2.2.0.jar

hadoop-hdfs-2.2.0.jar

hadoop-hdfs-2.2.0-tests.jar

hadoop-mapreduce-client-app-2.2.0.jar

hadoop-mapreduce-client-common-2.2.0.jar

hadoop-mapreduce-client-core-2.2.0.jar

hadoop-mapreduce-client-jobclient-2.2.0.jar

hadoop-mapreduce-client-jobclient-2.2.0-tests.jar

hadoop-mapreduce-client-shuffle-2.2.0.jar

hadoop-yarn-api-2.2.0.jar

hadoop-yarn-client-2.2.0.jar

hadoop-yarn-common-2.2.0.jar

hadoop-yarn-server-common-2.2.0.jar

hadoop-yarn-server-nodemanager-2.2.0.jar

hbase-client-0.96.2-hadoop2.jar

hbase-common-0.96.2-hadoop2.jar

hbase-common-0.96.2-hadoop2-tests.jar

hbase-examples-0.96.2-hadoop2.jar

hbase-hadoop2-compat-0.96.2-hadoop2.jar

hbase-hadoop-compat-0.96.2-hadoop2.jar

hbase-it-0.96.2-hadoop2.jar

hbase-it-0.96.2-hadoop2-tests.jar

hbase-prefix-tree-0.96.2-hadoop2.jar

hbase-protocol-0.96.2-hadoop2.jar

hbase-server-0.96.2-hadoop2.jar

hbase-server-0.96.2-hadoop2-tests.jar

hbase-shell-0.96.2-hadoop2.jar

hbase-testing-util-0.96.2-hadoop2.jar

hbase-thrift-0.96.2-hadoop2.jar

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib#

4)、将m1上面的hbase0.96.2复制到m2,s1,s2同样的目录中

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# scp -r /home/hadoop/hbase-0.96.2-hadoop2 root@m2:/home/hadoop

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# scp -r /home/hadoop/hbase-0.96.2-hadoop2 root@s1:/home/hadoop

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# scp -r /home/hadoop/hbase-0.96.2-hadoop2 root@s2:/home/hadoop

5)、在m1上启动hbase0.96.2,执行命令:

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/start-hbase.sh

starting master, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-master-m1.out

s1: starting regionserver, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-regionserver-s1.out

s2: starting regionserver, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-regionserver-s2.out

root@m1:/home/hadoop# jps

6688 NameNode

7540 HMaster

2884 JournalNode

4375 DFSZKFailoverController

2553 QuorumPeerMain

7769 Jps

4075 ResourceManager

root@m1:/home/hadoop#

执行命令后,浏览网址可以看效果:http://m1:60010/master-status

6)、在m1上用shell测试连接hbase

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell

2014-07-27 09:31:07,601 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help' for list of supported commands.

Type "exit" to leave the HBase Shell

Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014

hbase(main):001:0> list

TABLE

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

0 row(s) in 2.8030 seconds

=> []

hbase(main):002:0> version

0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014

hbase(main):003:0> status

2 servers, 0 dead, 1.0000 average load

hbase(main):004:0> create 'test_idoall_org','uid','name'

0 row(s) in 0.5800 seconds

=> Hbase::Table - test_idoall_org

hbase(main):005:0> list

TABLE

test_idoall_org

1 row(s) in 0.0320 seconds

=> ["test_idoall_org"]

hbase(main):006:0> put 'test_idoall_org','10086','name:idoall','idoallvalue'

0 row(s) in 0.1090 seconds ^

hbase(main):009:0> get 'test_idoall_org','10086'

COLUMN CELL

name:idoall timestamp=1406424831473, value=idoallvalue

1 row(s) in 0.0450 seconds

hbase(main):010:0> scan 'test_idoall_org'

ROW COLUMN+CELL

10086 column=name:idoall, timestamp=1406424831473, value=idoallvalue

1 row(s) in 0.0620 seconds

hbase(main):011:0>

7)、在m2上启动hbase,同样执行命令:

root@m2:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase-daemon.sh start master

starting master, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-master-m2.out

root@m2:/home/hadoop#

执行命令后,在浏览器打开网址也可以看到m2上的hbase状态:http://m2:60010/master-status

8)、测试m1和m2的主从备份切换

a)这时在浏览器打开http://m1:60010/master-status和http://m2:60010/master-status,可以看到下图的状态

b)我们在m1上停止掉hbase的进程

root@m1:/home/hadoop# jps

6688 NameNode

7540 HMaster

2884 JournalNode

8645 Jps

4375 DFSZKFailoverController

2553 QuorumPeerMain

4075 ResourceManager

root@m1:/home/hadoop# kill -9 7540

root@m1:/home/hadoop# jps

6688 NameNode

2884 JournalNode

4375 DFSZKFailoverController

2553 QuorumPeerMain

4075 ResourceManager

8655 HMaster

8719 Jps

root@m1:/home/hadoop#

再打开网址,会发现m1已经打不开,而m2的hbase集群状态已经被改变

至此,hbase已经配置完,并且主从故障转移是可用的。

3.4 在ubuntu12.04的m1上面安装mysql5.5.x

1)、apt-get install mysql-server mysql-client mysql-common

过程中会弹出一个界面,让你输入root的密码。我设置的是123456

安装后可以测试下mysql的连接状态:mysql -uroot -p123456

可以用service mysql stop/service mysql start来启动和停止mysql状态

2)、授权可以远程访问mysql

root@m1:/home/hadoop# mysql -uroot -p123456

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 36

Server version: 5.5.22-0ubuntu1 (Ubuntu)

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> grant all on *.* to 'root'@'%' identified by '123456' WITH GRANT OPTION;

Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;

Query OK, 0 rows affected (0.00 sec)

mysql> quit

Bye

3)、如果还无法远程连接,打开:vi /etc/mysql/my.cnf。将bind-address=127.0.0.1,改为本机ip,重新启动mysql

3.5 hive 0.13.1安装(在m1上操作)

1)、将apache-hive-0.13.1-bin.tar.gz解压到/home/hadoop/hive-0.13.1

2)、进入到hive的conf文件,将模板文件复制出对应的配置文件

root@m1:/home/hadoop/hive-0.13.1/conf# cp hive-env.sh.template hive-env.sh

root@m1:/home/hadoop/hive-0.13.1/conf# cp hive-default.xml.template hive-site.xml

3)、修改hive-env.sh文件,主要设置hadoop目录

root@m1:/home/hadoop/hive-0.13.1/conf# vi hive-env.sh

HADOOP_HOME=/home/hadoop/hadoop-2.2.0

4)、修改hive-site.xml文件

root@m1:/home/hadoop/hive-0.13.1/conf# vi hive-site.xml

hive.metastore.warehouse.dir

hdfs://mycluster/user/hive/warehouse

The list of zookeeper servers to talk to. This isonly needed for read/write locks.

hive.exec.scratchdir

hdfs://mycluster/user/hive/scratchdir

hive.querylog.location

/home/hadoop/hive-0.13.1/logs

javax.jdo.option.ConnectionURL

jdbc:mysql://m1:3306/hiveMeta?createDatabaseIfNotExist=true

javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver

javax.jdo.option.ConnectionUserName

root

javax.jdo.option.ConnectionPassword

123456

hive.aux.jars.path

file:///home/hadoop/hive-0.13.1/lib/hbase-hadoop-compat-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-hadoop2-compat-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hive-h

base-handler-0.13.1.jar,file:///home/hadoop/hive-0.13.1/lib/protobuf-java-2.5.0.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-client-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-common-0.96.2-hadoop2

.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-protocol-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-server-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/zookeeper-3.4.5.jar,file:///home/had

oop/hive-0.13.1/lib/guava-11.0.2.jar,file:///home/hadoop/hive-0.13.1/lib/htrace-core-2.04.jar

hive.zookeeper.quorum

m1,m2,s1,s2

5)、hive-site.xml中hive.aux.jars.path配置项包含的jar,hive-hbase-handler-0.13.1.jar和guava-11.0.2.jar是默认就有的,只需要执行以下命令,将其他的从hadoop/zookeeper/hbase中复制过来即可

root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/protobuf-java-2.5.0.jar /home/hadoop/hive-0.13.1/lib

root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-client-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib

root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-common-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib

root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-protocol-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib

root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-server-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib

root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop2-compat-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib

root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop-compat-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib

root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/htrace-core-2.04.jar /home/hadoop/hive-0.13.1/lib

root@m1:/home/hadoop# cp /home/hadoop/zookeeper-3.4.5/dist-maven/zookeeper-3.4.5.jar /home/hadoop/hive-0.13.1/lib

6)、mysql的odbc驱动,可以到这里下载http://dev.mysql.com/downloads/connector/j/,解压后,将目录中的mysql-connector-java-5.1.31-bin.jar复制到 /home/hadoop/hive-0.13.1/lib

7)、创建测试数据,以及在hadoop上创建数据仓库目录

root@m1:/home/hadoop/hive-0.13.1/conf# vi /home/hadoop/hive-0.13.1/testdata001.dat

12306,mname,yname

10086,myidoall,youidoall

/home/hadoop/hadoop-2.2.0/bin/hadoop fs -mkdir -p /user/hive/warehouse

8)、使用shell命令,测试hive

root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive

14/07/27 11:17:35 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces

14/07/27 11:17:35 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize

14/07/27 11:17:35 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

14/07/27 11:17:35 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node

14/07/27 11:17:35 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive

14/07/27 11:17:35 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack

14/07/27 11:17:35 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize

14/07/27 11:17:35 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/hadoop/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties

hive> show databases;

OK

default

Time taken: 0.464 seconds, Fetched: 1 row(s)

hive> create database testidoall;

OK

Time taken: 0.279 seconds

hive> show databases;

OK

default

testidoall

Time taken: 0.021 seconds, Fetched: 2 row(s)

hive> use testidoall;

OK

Time taken: 0.039 seconds

hive> create external table testtable(uid int,myname string,youname string) row format delimited fields terminated by ',' location '/user/hive/warehouse/testtable';

OK

Time taken: 0.205 seconds

hive> LOAD DATA LOCAL INPATH '/home/hadoop/hive-0.13.1/testdata001.dat' OVERWRITE INTO TABLE testtable;

Copying data from file:/home/hadoop/hive-0.13.1/testdata001.dat

Copying file: file:/home/hadoop/hive-0.13.1/testdata001.dat

Loading data to table testidoall.testtable

rmr: DEPRECATED: Please use 'rm -r' instead.

Deleted hdfs://mycluster/user/hive/warehouse/testtable

Table testidoall.testtable stats: [numFiles=0, numRows=0, totalSize=0, rawDataSize=0]

OK

Time taken: 0.77 seconds

hive> select * from testtable;

OK

12306 mname yname

10086 myidoall youidoall

Time taken: 0.279 seconds, Fetched: 2 row(s)

hive>

至此,hive已经安装完成。

3.6 hive to hbase(Hive中的表数据导入到Hbase中去)

1)、创建hbase可以识别的表

root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive

14/07/27 11:33:53 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces

14/07/27 11:33:53 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize

14/07/27 11:33:53 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

14/07/27 11:33:53 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node

14/07/27 11:33:53 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive

14/07/27 11:33:53 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack

14/07/27 11:33:53 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize

14/07/27 11:33:53 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/hadoop/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties

hive> show databases;

OK

default

testidoall

Time taken: 0.45 seconds, Fetched: 2 row(s)

hive> use testidoall;

OK

Time taken: 0.021 seconds

hive> show tables;

OK

testtable

Time taken: 0.032 seconds, Fetched: 1 row(s)

hive> CREATE TABLE hive2hbase_idoall(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "hive2hbase_idoall");

OK

Time taken: 2.332 seconds

hive> show tables;

OK

hive2hbase_idoall

testtable

Time taken: 0.036 seconds, Fetched: 2 row(s)

hive>

2)、创建本地表,用来存储数据,然后插入到Hbase用的,相当于一张中间表了。同时将之前的测试数据导入到这张中间表。

hive> create table hive2hbase_idoall_middle(foo int,bar string)row format delimited fields terminated by ',';

OK

Time taken: 0.086 seconds

hive> show tables;

OK

hive2hbase_idoall

hive2hbase_idoall_middle

testtable

Time taken: 0.03 seconds, Fetched: 3 row(s)

hive> load data local inpath '/home/hadoop/hive-0.13.1/testdata001.dat' overwrite into table hive2hbase_idoall_middle;

Copying data from file:/home/hadoop/hive-0.13.1/testdata001.dat

Copying file: file:/home/hadoop/hive-0.13.1/testdata001.dat

Loading data to table testidoall.hive2hbase_idoall_middle

rmr: DEPRECATED: Please use 'rm -r' instead.

Deleted hdfs://mycluster/user/hive/warehouse/testidoall.db/hive2hbase_idoall_middle

Table testidoall.hive2hbase_idoall_middle stats: [numFiles=1, numRows=0, totalSize=43, rawDataSize=0]

OK

Time taken: 0.683 seconds

hive>

3)、将本地中间表(hive2hbase_idoall_middle)导入到表(hive2hbase_idoall)中,会自动同步到hbase。

hive> insert overwrite table hive2hbase_idoall select * from hive2hbase_idoall_middle;

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1406394452186_0002, Tracking URL = http://m1:8088/proxy/application_1406394452186_0002/

Kill Command = /home/hadoop/hadoop-2.2.0/bin/hadoop job -kill job_1406394452186_0002

Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0

2014-07-27 11:44:11,491 Stage-0 map = 0%, reduce = 0%

2014-07-27 11:44:22,684 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 1.51 sec

MapReduce Total cumulative CPU time: 1 seconds 510 msec

Ended Job = job_1406394452186_0002

MapReduce Jobs Launched:

Job 0: Map: 1 Cumulative CPU: 1.51 sec HDFS Read: 288 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 510 msec

OK

Time taken: 25.613 seconds

hive> select * from hive2hbase_idoall;

OK

10086 myidoall

12306 mname

Time taken: 0.179 seconds, Fetched: 2 row(s)

hive> select * from hive2hbase_idoall_middle;

OK

12306 mname

10086 myidoall

Time taken: 0.088 seconds, Fetched: 2 row(s)

hive>

4)、用shell连接hbase,查看hive过来的数据是否已经存在

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell

2014-07-27 11:47:14,454 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help' for list of supported commands.

Type "exit" to leave the HBase Shell

Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014

hbase(main):001:0> list

TABLE

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

hive2hbase_idoall

test_idoall_org

2 row(s) in 2.9480 seconds

=> ["hive2hbase_idoall", "test_idoall_org"]

hbase(main):002:0> scan "hive2hbase_idoall"

ROW COLUMN+CELL

10086 column=cf1:val, timestamp=1406432660860, value=myidoall

12306 column=cf1:val, timestamp=1406432660860, value=mname

2 row(s) in 0.0540 seconds

hbase(main):003:0> get "hive2hbase_idoall",'12306'

COLUMN CELL

cf1:val timestamp=1406432660860, value=mname

1 row(s) in 0.0110 seconds

hbase(main):004:0>

至此,hive to hbase的测试功能正常。

3.7 hbase to hive(Hbase中的表数据导入到Hive)

1)、在hbase下创建表hbase2hive_idoall

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell

2014-07-27 11:54:25,844 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help' for list of supported commands.

Type "exit" to leave the HBase Shell

Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014

hbase(main):001:0> create 'hbase2hive_idoall','gid','info'

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

0 row(s) in 3.4970 seconds

=> Hbase::Table - hbase2hive_idoall

hbase(main):002:0> put 'hbase2hive_idoall','3344520','info:time','20140704'

0 row(s) in 0.1020 seconds

hbase(main):003:0> put 'hbase2hive_idoall','3344520','info:address','HK'

0 row(s) in 0.0090 seconds

hbase(main):004:0> scan 'hbase2hive_idoall'

ROW COLUMN+CELL

3344520 column=info:address, timestamp=1406433302317, value=HK

3344520 column=info:time, timestamp=1406433297567, value=20140704

1 row(s) in 0.0330 seconds

hbase(main):005:0>

2)、Hive下创建表连接Hbase中的表

root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive

14/07/27 11:57:20 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces

14/07/27 11:57:20 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize

14/07/27 11:57:20 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

14/07/27 11:57:20 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node

14/07/27 11:57:20 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive

14/07/27 11:57:20 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack

14/07/27 11:57:20 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize

14/07/27 11:57:20 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/hadoop/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties

hive> show databases;

OK

default

testidoall

Time taken: 0.449 seconds, Fetched: 2 row(s)

hive> use testidoall;

OK

Time taken: 0.02 seconds

hive> show tables;

OK

hive2hbase_idoall

hive2hbase_idoall_middle

testtable

Time taken: 0.026 seconds, Fetched: 3 row(s)

hive> create external table hbase2hive_idoall (key string,gid map)STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" ="info:") TBLPROPERTIES ("hbase.table.name" = "hbase2hive_idoall");

OK

Time taken: 1.696 seconds

hive> show tables;

OK

hbase2hive_idoall

hive2hbase_idoall

hive2hbase_idoall_middle

testtable

Time taken: 0.034 seconds, Fetched: 4 row(s)

hive> select * from hbase2hive_idoall;

OK

3344520 {"address":"HK","time":"20140704"}

Time taken: 0.701 seconds, Fetched: 1 row(s)

hive>

至此,如文章标题所描述的ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1分布式环境部署,全部测试完毕,过程中也遇到了一些坑,会在常见问题中介绍。希望这个测试笔记可以帮助到更多的人。

4. 常见问题

1、过程中如果在hadoop(namenode/datanode/yarn)、hbase、hive启动出现问题时,一定要用tail -n 100 *.log仔细查看相关的日志,可以发现很多有用的信息。以下几个命令,也有助于在命令行模式追踪错误。

1)、hadoop在控制台输出debug信息,执行完以下命令后,可以启动namenode,datanode,yarn测试效果

root@m1:/home/hadoop# export HADOOP_ROOT_LOGGER=DEBUG,console

2)、hive 在控制台输出debug信息

root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive --hiveconf hive.root.logger=DEBUG,console

2、mysql在启动时,遇到过job failed to start,可以用以下几个命令,重新安装解决。

rm /var/lib/mysql/ -R

rm /etc/mysql/ -R

apt-get autoremove mysql* —purge

apt-get remove apparmor

apt-get install mysql-server mysql-client mysql-common

3、dpkg 被中断,您必须手工运行 sudo dpkg –configure -a解决此问题

sudo rm /var/lib/dpkg/updates/*

sudo apt-get update

sudo apt-get upgrade

5. 参考资料

_00018 Hadoop-2.2.0 + Hbase-0.96.2 + Hive-0.13.1 分布式环境整合,Hadoop-2.X使用HA方式

Hadoop2.2.0源代码编译

hadoop2.1.0编译安装教程

CentOS6.4编译Hadoop2.2.0

相关文章
最新文章
热点推荐