其他分享
首页 > 其他分享> > Hive学习笔记02

Hive学习笔记02

作者:互联网

1. Hive基本操作

  a. DML操作

load data local inpath 'path' into table tb_load1;
View Code
insert overwrite table stu_buck 
select * from student cluster by(Sno); 
View Code

2. Hive Join

不支持等值连接,因为等值连接很难转换为mapreduce任务。

https://www.cnblogs.com/yiwanfan/p/9628235.html

3. Hive函数入门

  a. 普通函数

  https://www.cnblogs.com/kimbo/p/6288516.html

  b. 用户自定义函数

当内置函数无法满足业务需求时,此时就可以考虑使用用户自定义函数。

自定义函数分为三种:

UDF开发实例:

新建Java项目:添加依赖 hive-exec-1.2.1.jar 和 hadoop-common-2.7.4.jar 依赖

新建类继承UDF,并重载evaluate,在里面实现业务逻辑

打成jar包

添加jar包到hive的classpath:hive>add jar /home/hadoop/udf.jar; 

创建临时函数与开发好的java 类关联:create temporary function tolowercase as 'cn.itcast.bigdata.udf.ToProvince';

在sql中就可以使用该函数了:Select tolowercase(name),age from t_test;

4. Hive函数高阶特性

a. UDTF函数-expode函数

explode函数是hive内置的UDTF函数,可以将一个map或者array类型的字段展开。array类型转换后是每个元素生成一行,map类型是每一对元素作为一行,key作为一列,value作为一列。

--数据
001,allen,usa|china|japan,1|3|7
002,kobe,usa|england|japan,2|3|5
--创建表
create table test_message(id int,name string,location array<string>,city array<int>) row format delimited fields terminated by ","
collection items terminated by '|';
--加载数据
load data local inpath "/root/hivedata/test_message.txt" into table test_message;
--explode
select explode(location) from test_message;
select name,explode(location) from test_message; --报错
当使用UDTF函数的时候,hive只允许对拆分字段进行访问的。
View Code

b. lateral view侧视图

lateral view侧视图,意义是配合UDTF来使用,把某一行数据拆分成多行数据,不加lateral view的UDTF智能提取单个字段拆分,并不能塞会原来数据表中,加上lateral view 就可以将拆分的单个字段数据与原始表数据关联上。

select subview.* from test_message lateral view explode(location) subview;
--lateral view explode 相当于一个拆分location字段的虚表,然后与原表进行关联.
select name,subview.* from test_message lateral view explode(location) subview as lc;
View Code

5. 行列转换

a. 多行转单列

+-----------------+-----------------+-----------------+--+
| row2col_1.col1  | row2col_1.col2  | row2col_1.col3  |
+-----------------+-----------------+-----------------+--+
| a               | b               | 1               |
| a               | b               | 2               |
| a               | b               | 3               |
| c               | d               | 4               |
| c               | d               | 5               |
| c               | d               | 6               |
+-----------------+-----------------+-----------------+--+
6 rows selected (0.096 seconds)
0: jdbc:hive2://hadoop01:10000> select col1, col2, concat_ws('|', collect_set(cast(col3 as string))) as col3
. . . . . . . . . . . . . . . > from row2col_1
. . . . . . . . . . . . . . . > group by col1, col2;
+-------+-------+--------+--+
| col1  | col2  |  col3  |
+-------+-------+--------+--+
| a     | b     | 1|2|3  |
| c     | d     | 4|5|6  |
+-------+-------+--------+--+
View Code

b. 单列转多行

需要使用UDTF(表生成函数)explode(),该函数接受array类型的参数,其作用恰好与collect_set相反,实现将array类型数据行转列。explode配合lateral view实现将某列数据拆分成多行。

+-----------------+-----------------+-----------------+--+
| col2row_2.col1  | col2row_2.col2  | col2row_2.col3  |
+-----------------+-----------------+-----------------+--+
| a               | b               | ["1","2","3"]   |
| c               | d               | ["4","5","6"]   |
+-----------------+-----------------+-----------------+--+
2 rows selected (0.075 seconds)
0: jdbc:hive2://hadoop01:10000> select col1, col2, lv.col3 as col3
. . . . . . . . . . . . . . . > from col2row_2
. . . . . . . . . . . . . . . > lateral view explode(col3) lv as col3;
+-------+-------+-------+--+
| col1  | col2  | col3  |
+-------+-------+-------+--+
| a     | b     | 1     |
| a     | b     | 2     |
| a     | b     | 3     |
| c     | d     | 4     |
| c     | d     | 5     |
| c     | d     | 6     |
+-------+-------+-------+--+
View Code

c. 多行转多列

+---------------+---------------+---------------+--+
| row2col.col1  | row2col.col2  | row2col.col3  |
+---------------+---------------+---------------+--+
| a             | c             | 1             |
| a             | d             | 2             |
| a             | e             | 3             |
| b             | c             | 4             |
| b             | d             | 5             |
| b             | e             | 6             |
+---------------+---------------+---------------+--+
6 rows selected (0.092 seconds)
0: jdbc:hive2://hadoop01:10000> select col1,
. . . . . . . . . . . . . . . > max(case col2 when 'c' then col3 else 0 end) as c,
. . . . . . . . . . . . . . . > max(case col2 when 'd' then col3 else 0 end) as d,
. . . . . . . . . . . . . . . > max(case col2 when 'e' then col3 else 0 end) as e
. . . . . . . . . . . . . . . > from row2col
. . . . . . . . . . . . . . . > group by col1;
-------+----+----+----+--+
| col1  | c  | d  | e  |
+-------+----+----+----+--+
| a     | 1  | 2  | 3  |
| b     | 4  | 5  | 6  |
+-------+----+----+----+--+
View Code

6. reflect函数

reflect函数可以支持在sql中调用java中的自带函数,秒杀一切udf函数。

+----------------+----------------+--+
| test_udf.col1  | test_udf.col2  |
+----------------+----------------+--+
| 1              | 2              |
| 4              | 3              |
| 6              | 4              |
| 7              | 5              |
| 5              | 6              |
+----------------+----------------+--+
5 rows selected (0.061 seconds)
0: jdbc:hive2://hadoop01:10000> select reflect("java.lang.Math","max",col1,col2) from test_udf;
+------+--+
| _c0  |
+------+--+
| 2    |
| 4    |
| 6    |
| 7    |
| 6    |
+------+--+
View Code

 

 

标签:02,函数,col2,col3,笔记,col1,explode,Hive,test
来源: https://www.cnblogs.com/qidi/p/11641130.html