Apache Hbase
作者:互联网
Author: Lijb
Eamil: lijb1121@163.com
WeChat: ljb1121
HBase是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System)所提供的分布式数据存储一样,HBase在Hadoop之上提供了类似于Bigtable的能力。HBase是Apache的Hadoop项目的子项目。HBase不同于一般的关系数据库,它是一个适合于非结构化数据存储的数据库。另一个不同的是HBase基于列的而不是基于行的模式。
Hbase和HDFS之间关系?
因为HDFS文件系统虽然支持海量数据存储,但是不擅长对单条记录做高效的管理(查询、修改、删除,增加)不支持对海量数据做随机读写。HBase是构建在HDFS上的一款NoSQL数据库,实现了对HDFS上的数据的高效管理,能够实现对海量数据的随机读写,实现行级别数据管理。
HBase数据库特点
- 大 hbase一张表规模一般是在数亿行*数百万列且每一列具备上千个版本
- 稀疏 ,HBase没有固定的表结构,在一行记录中,可以有任意多个列存在(提升磁盘利用率)。
- HBase没有数据类型,所有的类型都是以字节数组形式存在。
- 该数据和常规的数据库最大的区别是在底层对表中记录管理形式上有很大的区别,因为绝大多数数据库都是面向行存储的模式,导致了系统的IO利用率低。在HBase中采用面向列存储的形式,极大的提升系统的IO利用率。
行存储和列存储
行存储
列存储
HBase环境搭建
- 确保hadoop能正常运行(HDFS),必须配置 HADOOP_HOME
- 安装zookeeper(管理hbase服务)
[root[@CentOS](https://my.oschina.net/u/1241776) ~]# tar -zxf zookeeper-3.4.6.tar.gz -C /usr/
[root[@CentOS](https://my.oschina.net/u/1241776) ~]# vi /usr/zookeeper-3.4.6/conf/zoo.cfg
tickTime=2000
dataDir=/root/zkdata
clientPort=2181
[root[@CentOS](https://my.oschina.net/u/1241776) ~]# mkdir /root/zkdata
[root[@CentOS](https://my.oschina.net/u/1241776) zookeeper-3.4.6]# ./bin/zkServer.sh start zoo.cfg
JMX enabled by default
Using config: /usr/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root[@CentOS](https://my.oschina.net/u/1241776) zookeeper-3.4.6]# ./bin/zkServer.sh status zoo.cfg
JMX enabled by default
Using config: /usr/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: standalone
[root@centos ~]# jps
1612 SecondaryNameNode
1348 NameNode
1742 QuorumPeerMain //zookeeper
1437 DataNode
- 安装HBase
[root@centos ~]# tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/
[root@centos ~]# vi /usr/hbase-1.2.4/conf/hbase-site.xml
~~~
~~~xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://CentOS:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>CentOS</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
~~~
~~~bash
[root@centos ~]# vi /usr/hbase-1.2.4/conf/regionservers
centos
[root@centos ~]# vi .bashrc
HBASE_MANAGES_ZK=false
HBASE_HOME=/usr/hbase-1.2.4
HADOOP_HOME=/usr/hadoop-2.6.0
JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
export HBASE_HOME
export HBASE_MANAGES_ZK
- 启动HBase
[root@centos ~]# start-hbase.sh
[root@centos ~]# jps
1612 SecondaryNameNode
2102 HRegionServer //负责实际表数据的读写操作
1348 NameNode
2365 Jps
1978 HMaster //类似namenode管理表相关元数据、管理ResgionServer
1742 QuorumPeerMain
1437 DataNode
可以访问:http://centos:16010
HBase Shell命令
- 连接Hbase
[root@centos ~]# hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017
hbase(main):001:0>
- 查看系统状态
hbase(main):001:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
- 查看当前系统版本
hbase(main):006:0> version
1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017
namespace操作(数据库)
- 查看系统数据库
hbase(main):003:0> list_namespace
NAMESPACE default
hbase
- 创建namespace
hbase(main):006:0> create_namespace 'baizhi',{'author'=>'zs'}
0 row(s) in 0.3260 seconds
- 查看namespace的表
hbase(main):004:0> list_namespace_tables 'hbase'
TABLE
meta
namespace
- 查看建库详情
hbase(main):008:0> describe_namespace 'baizhi'
DESCRIPTION
{NAME => 'baizhi', author => 'zs'}
1 row(s) in 0.0550 seconds
- 修改namespace
hbase(main):010:0> alter_namespace 'baizhi',{METHOD => 'set','author'=> 'wangwu'}
0 row(s) in 0.2520 seconds
hbase(main):011:0> describe_namespace 'baizhi'
DESCRIPTION
{NAME => 'baizhi', author => 'wangwu'} 1 row(s) in 0.0030 seconds
hbase(main):012:0> alter_namespace 'baizhi',{METHOD => 'unset',NAME => 'author'}
0 row(s) in 0.0550 seconds
hbase(main):013:0> describe_namespace 'baizhi'
DESCRIPTION
{NAME => 'baizhi'}
1 row(s) in 0.0080 seconds
- 删除namespace
hbase(main):020:0> drop_namespace 'baizhi'
0 row(s) in 0.0730 seconds
HBase不允许删除有表的数据库
table相关操作(DDL操作)
- 创建表
hbase(main):023:0> create 't_user','cf1','cf2'
0 row(s) in 1.2880 seconds
=> Hbase::Table - t_user
hbase(main):024:0> create 'baizhi:t_user',{NAME=>'cf1',VERSIONS=>3},{NAME=>'cf2',TTL=>3600}
0 row(s) in 1.2610 seconds
=> Hbase::Table - baizhi:t_user
- 查看建表详情
hbase(main):026:0> describe 'baizhi:t_user'
Table baizhi:t_user is ENABLED
baizhi:t_user
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION =
> 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', B
LOCKCACHE => 'true'}
{NAME => 'cf2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION =
> 'NONE', MIN_VERSIONS => '0', TTL => '3600 SECONDS (1 HOUR)', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY
=> 'false', BLOCKCACHE => 'true'}
2 row(s) in 0.0240 seconds
- 判断表是否存在
hbase(main):030:0> exists 't_user'
Table t_user does exist
0 row(s) in 0.0250 seconds
- enable/is_enabled/enable_all (类似disable、disable_all、is_disabled)
hbase(main):036:0> enable 't_user'
0 row(s) in 0.0220 seconds
hbase(main):037:0> is_enabled 't_user'
true
0 row(s) in 0.0090 seconds
hbase(main):035:0> enable_all 't_.*'
t_user
Enable the above 1 tables (y/n)?
y
1 tables successfully enabled
- drop表
hbase(main):038:0> disable 't_user'
0 row(s) in 2.2930 seconds
hbase(main):039:0> drop 't_user'
0 row(s) in 1.2670 seconds
- 展示所有用户表(无法查看系统表hbase下的表)
hbase(main):042:0> list 'baizhi:.*'
TABLE
baizhi:t_user
1 row(s) in 0.0050 seconds
=> ["baizhi:t_user"]
hbase(main):043:0> list
TABLE
baizhi:t_user
1 row(s) in 0.0050 seconds
- 获取一个表的引用
hbase(main):002:0> t=get_table 'baizhi:t_user'
0 row(s) in 0.0440 seconds
- 修改表参数
hbase(main):008:0> alter 'baizhi:t_user',{ NAME => 'cf2', TTL => 60 }
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.7000 seconds
表的DML操作
- Put指令
hbase(main):010:0> put 'baizhi:t_user',1,'cf1:name','zhangsan'
0 row(s) in 0.2060 seconds
hbase(main):011:0> t = get_table 'baizhi:t_user'
0 row(s) in 0.0010 seconds
hbase(main):012:0> t.put 1,'cf1:age','18'
0 row(s) in 0.0500 seconds
- Get指令
hbase(main):017:0> get 'baizhi:t_user',1
COLUMN CELL
cf1:age timestamp=1536996547967, value=21
cf1:name timestamp=1536996337398, value=zhangsan
2 row(s) in 0.0680 seconds
hbase(main):019:0> get 'baizhi:t_user',1,{COLUMN =>'cf1', VERSIONS=>10}
COLUMN CELL
cf1:age timestamp=1536996547967, value=21
cf1:age timestamp=1536996542980, value=20
cf1:age timestamp=1536996375890, value=18
cf1:name timestamp=1536996337398, value=zhangsan
4 row(s) in 0.0440 seconds
hbase(main):020:0> get 'baizhi:t_user',1,{COLUMN =>'cf1:age', VERSIONS=>10}
COLUMN CELL
cf1:age timestamp=1536996547967, value=21
cf1:age timestamp=1536996542980, value=20
cf1:age timestamp=1536996375890, value=18
3 row(s) in 0.0760 seconds
hbase(main):021:0> get 'baizhi:t_user',1,{COLUMN =>'cf1:age', TIMESTAMP => 1536996542980 }
COLUMN CELL
cf1:age timestamp=1536996542980, value=20
1 row(s) in 0.0260 seconds
hbase(main):025:0> get 'baizhi:t_user',1,{TIMERANGE => [1536996375890,1536996547967]}
COLUMN CELL
cf1:age timestamp=1536996542980, value=20
1 row(s) in 0.0480 seconds
hbase(main):026:0> get 'baizhi:t_user',1,{TIMERANGE => [1536996375890,1536996547967],VERSIONS=>10}
COLUMN CELL
cf1:age timestamp=1536996542980, value=20
cf1:age timestamp=1536996375890, value=18
2 row(s) in 0.0160 seconds
- scan
hbase(main):004:0> scan 'baizhi:t_user'
ROW COLUMN+CELL
1 column=cf1:age, timestamp=1536996547967, value=21
1 column=cf1:height, timestamp=1536997284682, value=170
1 column=cf1:name, timestamp=1536996337398, value=zhangsan
1 column=cf1:salary, timestamp=1536997158586, value=15000
1 column=cf1:weight, timestamp=1536997311001, value=\x00\x00\x00\x00\x00\x00\x00\x05
2 column=cf1:age, timestamp=1536997566506, value=18
2 column=cf1:name, timestamp=1536997556491, value=lisi
2 row(s) in 0.0470 seconds
hbase(main):009:0> scan 'baizhi:t_user', {STARTROW => '1',LIMIT=>1}
ROW COLUMN+CELL
1 column=cf1:age, timestamp=1536996547967, value=21
1 column=cf1:height, timestamp=1536997284682, value=170
1 column=cf1:name, timestamp=1536996337398, value=zhangsan
1 column=cf1:salary, timestamp=1536997158586, value=15000
1 column=cf1:weight, timestamp=1536997311001, value=\x00\x00\x00\x00\x00\x00\x00\x05
1 row(s) in 0.0280 seconds
hbase(main):011:0> scan 'baizhi:t_user', {COLUMNS=>'cf1:age',TIMESTAMP=>1536996542980}
ROW COLUMN+CELL
1 column=cf1:age, timestamp=1536996542980, value=20
1 row(s) in 0.0330 seconds
- delete/deleteall
hbase(main):013:0> scan 'baizhi:t_user', {COLUMNS=>'cf1:age',VERSIONS=>3}
ROW COLUMN+CELL
1 column=cf1:age, timestamp=1536996547967, value=21
1 column=cf1:age, timestamp=1536996542980, value=20
1 column=cf1:age, timestamp=1536996375890, value=18
2 column=cf1:age, timestamp=1536997566506, value=18
2 row(s) in 0.0150 seconds
hbase(main):014:0> delete 'baizhi:t_user',1,'cf1:age',1536996542980
0 row(s) in 0.0920 seconds
hbase(main):015:0> scan 'baizhi:t_user', {COLUMNS=>'cf1:age',VERSIONS=>3}
ROW COLUMN+CELL
1 column=cf1:age, timestamp=1536996547967, value=21
2 column=cf1:age, timestamp=1536997566506, value=18
2 row(s) in 0.0140 seconds
hbase(main):016:0> delete 'baizhi:t_user',1,'cf1:age'
0 row(s) in 0.0170 seconds
hbase(main):017:0> scan 'baizhi:t_user', {COLUMNS=>'cf1:age',VERSIONS=>3}
ROW COLUMN+CELL
2 column=cf1:age, timestamp=1536997566506, value=18
1 row(s) in 0.0170 seconds
hbase(main):019:0> deleteall 'baizhi:t_user',1
0 row(s) in 0.0200 seconds
hbase(main):020:0> get 'baizhi:t_user',1
COLUMN CELL
0 row(s) in 0.0200 seconds
- truncate
hbase(main):022:0> truncate 'baizhi:t_user'
Truncating 'baizhi:t_user' table (it may take a while):
- Disabling table...
- Truncating table...
0 row(s) in 4.0040 seconds
HBase java API
- maven依赖
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.4</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>1.2.4</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-protocol</artifactId>
<version>1.2.4</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.2.4</version>
</dependency>
- 创建Connection和Admin对象
private Connection conn;
private Admin admin;
@Before
public void before() throws IOException {
Configuration config= HBaseConfiguration.create();
//因为HMaster和HRegionServer都将信息注册在zookeeper中
config.set("hbase.zookeeper.quorum","centos");
conn= ConnectionFactory.createConnection(config);
admin=conn.getAdmin();
}
@After
public void after() throws IOException {
admin.close();
conn.close();
}
- 创建Namespace
NamespaceDescriptor nd=NamespaceDescriptor.create("zpark")
.addConfiguration("author","zhangsan")
.build();
admin.createNamespace(nd);
- 创建表
//create 'zpark:t_user',{NAME=>'cf1',VERIONS=>3},,{NAME=>'cf2',TTL=>10}
TableName tname=TableName.valueOf("zpark:t_user");
//创建表的描述
HTableDescriptor t_user=new HTableDescriptor(tname);
//构建列簇
HColumnDescriptor cf1=new HColumnDescriptor("cf1");
cf1.setMaxVersions(3);
HColumnDescriptor cf2=new HColumnDescriptor("cf2");
cf2.setTimeToLive(10);
//添加列簇
t_user.addFamily(cf1);
t_user.addFamily(cf2);
admin.createTable(t_user);
- 插入数据
TableName tname=TableName.valueOf("zpark:t_user");
Table t_user = conn.getTable(tname);
String[] company={"www.baizhi.com","www.sina.com"};
for(int i=0;i<1000;i++){
String com=company[new Random().nextInt(2)];
String rowKey=com;
if(i<10){
rowKey+=":00"+i;
}else if(i<100){
rowKey+=":0"+i;
}else if(i<1000){
rowKey+=":"+i;
}
Put put=new Put(rowKey.getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),("user"+i).getBytes());
put.addColumn("cf1".getBytes(),"age".getBytes(), Bytes.toBytes(i));
put.addColumn("cf1".getBytes(),"salary".getBytes(),Bytes.toBytes(5000+1000*i));
put.addColumn("cf1".getBytes(),"company".getBytes(),com.getBytes());
t_user.put(put);
}
t_user.close();
- 批量插入
TableName tname=TableName.valueOf("zpark:t_user");
String[] company={"www.baizhi.com","www.sina.com"};
BufferedMutator mutator=conn.getBufferedMutator(tname);
for(int i=0;i<1000;i++){
String com=company[new Random().nextInt(2)];
String rowKey=com;
if(i<10){
rowKey+=":00"+i;
}else if(i<100){
rowKey+=":0"+i;
}else if(i<1000){
rowKey+=":"+i;
}
Put put=new Put(rowKey.getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),("user"+i).getBytes());
put.addColumn("cf1".getBytes(),"age".getBytes(), Bytes.toBytes(i));
put.addColumn("cf1".getBytes(),"salary".getBytes(),Bytes.toBytes(5000+1000*i));
put.addColumn("cf1".getBytes(),"company".getBytes(),com.getBytes());
mutator.mutate(put);
}
mutator.close();
mutator.close();
- 修改数据
TableName tname=TableName.valueOf("zpark:t_user");
Table t_user = conn.getTable(tname);
Put put=new Put("www.baizhi.com:000".getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),("zhangsan").getBytes());
t_user.put(put);
t_user.close();
- 查询一条记录
TableName tname=TableName.valueOf("zpark:t_user");
Table t_user = conn.getTable(tname);
Get get=new Get("www.sina.com:002".getBytes());
//表示一行数据,涵盖n个cell
Result result = t_user.get(get);
String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes()));
Integer age = Bytes.toInt(result.getValue("cf1".getBytes(), "age".getBytes()));
Integer salary = Bytes.toInt(result.getValue("cf1".getBytes(), "salary".getBytes()));
System.out.println(name+","+age+","+salary);
t_user.close();
- 查询多条
TableName tname=TableName.valueOf("zpark:t_user");
Table t_user = conn.getTable(tname);
Scan scan=new Scan();
// scan.setStartRow("www.baizhi.com:000".getBytes());
// scan.setStopRow("www.taizhi.com:020".getBytes());
Filter filter1=new PrefixFilter("www.baizhi.com:00".getBytes());
Filter filter2=new PrefixFilter("www.sina.com:00".getBytes());
FilterList filter=new FilterList(FilterList.Operator.MUST_PASS_ONE,filter1,filter2);
scan.setFilter(filter);
ResultScanner resultScanner = t_user.getScanner(scan);
for (Result result : resultScanner) {
String rowKey=Bytes.toString(result.getRow());
String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes()));
Integer age = Bytes.toInt(result.getValue("cf1".getBytes(), "age".getBytes()));
Integer salary = Bytes.toInt(result.getValue("cf1".getBytes(), "salary".getBytes()));
System.out.println(rowKey+" => "+ name+","+age+","+salary);
}
t_user.close();
- 查询多条+Filter
TableName tname=TableName.valueOf("zpark:t_user");
Table t_user = conn.getTable(tname);
Scan scan=new Scan();
// scan.setStartRow("www.baizhi.com:000".getBytes());
// scan.setStopRow("www.taizhi.com:020".getBytes());
Filter filter1=new PrefixFilter("www.baizhi.com:00".getBytes());
Filter filter2=new PrefixFilter("www.sina.com:00".getBytes());
FilterList filter=new FilterList(FilterList.Operator.MUST_PASS_ALL,filter1,filter2);
scan.setFilter(filter);
ResultScanner resultScanner = t_user.getScanner(scan);
for (Result result : resultScanner) {
String rowKey=Bytes.toString(result.getRow());
String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes()));
Integer age = Bytes.toInt(result.getValue("cf1".getBytes(), "age".getBytes()));
Integer salary = Bytes.toInt(result.getValue("cf1".getBytes(), "salary".getBytes()));
System.out.println(rowKey+" => "+ name+","+age+","+salary);
}
t_user.close();
HBase和MapReduce集成
public class CustomJobSubmiter extends Configured implements Tool {
public int run(String[] args) throws Exception {
//创建Job
Configuration config = HBaseConfiguration.create(getConf());
Job job = Job.getInstance(config);
job.setJarByClass(CustomJobSubmiter.class); // class that contains mapper
//设置输入、输出格式
job.setInputFormatClass(TableInputFormat.class);
job.setOutputFormatClass(TableOutputFormat.class);
TableMapReduceUtil.initTableMapperJob(
"zpark:t_user",
new Scan(),
UserMapper.class,
Text.class,
DoubleWritable.class,
job
);
TableMapReduceUtil.initTableReducerJob(
"zpark:t_user_count",
UserReducer.class,
job
);
job.setCombinerClass(UserCombiner.class);
job.waitForCompletion(true);
return 0;
}
public static void main(String[] args) throws Exception {
ToolRunner.run(new CustomJobSubmiter(),args);
}
public static class UserMapper extends TableMapper<Text, CountWritable>{
private Text k=new Text();
private DoubleWritable v=new DoubleWritable();
@Override
protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
String company= Bytes.toString(value.getValue("cf1".getBytes(),"company".getBytes()));
Double salary= Bytes.toInt(value.getValue("cf1".getBytes(),"salary".getBytes()))*1.0;
k.set(company);
v.set(salary);
context.write(k,new CountWritable(1,salary,salary,salary));
}
}
public static class UserCombiner extends Reducer<Text, CountWritable,Text, CountWritable>{
@Override
protected void reduce(Text key, Iterable<CountWritable> values, Context context) throws IOException, InterruptedException {
int total=0;
double tatalSalary=0.0;
double avgSalary=0.0;
double maxSalary=0.0;
double minSalary=Integer.MAX_VALUE;
for (CountWritable value : values) {
tatalSalary+=value.getTatalSalary();
total+=value.getTotal();
if(minSalary>value.getMinSalary()){
minSalary=value.getMinSalary();
}
if(maxSalary<value.getMaxSalary()){
maxSalary=value.getMaxSalary();
}
}
context.write(key,new CountWritable(total,tatalSalary,maxSalary,minSalary));
}
}
public static class UserReducer extends TableReducer<Text, CountWritable, NullWritable>{
@Override
protected void reduce(Text key, Iterable<CountWritable> values, Context context) throws IOException, InterruptedException {
int total=0;
double tatalSalary=0.0;
double avgSalary=0.0;
double maxSalary=0.0;
double minSalary=Integer.MAX_VALUE;
for (CountWritable value : values) {
tatalSalary+=value.getTatalSalary();
total+=value.getTotal();
if(minSalary>value.getMinSalary()){
minSalary=value.getMinSalary();
}
if(maxSalary<value.getMaxSalary()){
maxSalary=value.getMaxSalary();
}
}
avgSalary=tatalSalary/total;
Put put=new Put(key.getBytes());
put.addColumn("cf1".getBytes(),"taotal".getBytes(),(total+"").getBytes());
put.addColumn("cf1".getBytes(),"tatalSalary".getBytes(),(tatalSalary+"").getBytes());
put.addColumn("cf1".getBytes(),"maxSalary".getBytes(),(maxSalary+"").getBytes());
put.addColumn("cf1".getBytes(),"minSalary".getBytes(),(minSalary+"").getBytes());
put.addColumn("cf1".getBytes(),"avgSalary".getBytes(),(avgSalary+"").getBytes());
context.write(null,put);
}
}
}
public class CountWritable implements Writable {
int total=0;
double tatalSalary=0.0;
double maxSalary=0.0;
double minSalary=Integer.MAX_VALUE;
public CountWritable(int total, double tatalSalary, double maxSalary, double minSalary) {
this.total = total;
this.tatalSalary = tatalSalary;
this.maxSalary = maxSalary;
this.minSalary = minSalary;
}
public CountWritable() {
}
public void write(DataOutput out) throws IOException {
out.writeInt(total);
out.writeDouble(tatalSalary);
out.writeDouble(maxSalary);
out.writeDouble(minSalary);
}
public void readFields(DataInput in) throws IOException {
total=in.readInt();
tatalSalary=in.readDouble();
maxSalary=in.readDouble();
minSalary=in.readDouble();
}
//....
}
HBase架构
HBase宏观架构
RegionServer架构
Region架构图
参考:http://www.blogjava.net/DLevin/archive/2015/08/22/426877.html
http://www.blogjava.net/DLevin/archive/2015/08/22/426950.html
HBase集群构建
- 确保HDFS正常运行(HDFS - HA)
- 配置安装HBase
[root@CentOSX ~]# tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/
[root@CentOSX ~]# vi /usr/hbase-1.2.4/conf/hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://mycluster/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>CentOSA,CentOSB,CentOSC</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
[root@centos ~]# vi /usr/hbase-1.2.4/conf/regionservers
CentOSA
CentOSB
CentOSC
[root@CentOSX ~]# vi .bashrc
HBASE_MANAGES_ZK=false
HBASE_HOME=/usr/hbase-1.2.4
HADOOP_HOME=/usr/hadoop-2.6.0
JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
export HBASE_HOME
export HBASE_MANAGES_ZK
[root@CentOSX ~]# source .bashrc
- 启动HBase
[root@CentOSX ~]# hbase-daemon.sh start master
[root@CentOSX ~]# hbase-daemon.sh start regionserver
标签:baizhi,value,user,Apache,Hbase,main,hbase,cf1 来源: https://blog.csdn.net/Jona_Li/article/details/94049276