Hive中的explode使用全解(附图片详解)
作者:互联网
数据表:
表名:default.class_info,分别是班级,姓名,成绩 3列
单列Explode:
需求:将student这一列中数据由一行变为多行(使用split和explode,并结合lateral view函数实现)
select
class,student_name
from
default.class_info
lateral view explode(split(student,',')) t as student_name
结果如下:
单列Posexplode
需求:想要给每班的每个同学按照顺序来一个编号(使用posexplode函数)
select
class,student_index + 1 as student_index,student_name
from
default.class_info
lateral view posexplode(split(student,',')) t as student_index,student_name;
注意:student_index+1 是因为index是从0开始的
结果如下:
多列Explode
需求:基于学生姓名和分数使其两两匹配,期望得到如下效果。
尝试: 先对两列进行explode
select
class,student_name,student_score
from
default.class_info
lateral view explode(split(student,',')) sn as student_name
lateral view explode(split(score,',')) sc as student_score
结果如下:
不太符合预期,如果对两列都进行explode的话,假设每列都有3个值,最终会变为3*3=9行,但我们实际只想要3行
解决办法:
我们进行两次posexplode,姓名和成绩都保留对应的序号,即便是变成了9行,通过where筛选只保留行号相同的index即可。
select
class,student_name,student_score
from
default.class_info
lateral view posexplode(split(student,',')) sn as student_index_sn,student_name
lateral view posexplode(split(score,',')) sc as student_index_sc,student_score
where
student_index_sn = student_index_sc;
结果如下:
需求:假设我们又想对同学的成绩进行一下排名(借助rank( ) 函数 )
select
class,
student_name,
student_score,
rank() over(partition by class order by student_score desc) as student_rank
from
default.class_info
lateral view posexplode(split(student,',')) sn as student_index_sn,student_name
lateral view posexplode(split(score,',')) sc as student_index_sc,student_score
where
student_index_sn = student_index_sc
order by class,student_rank;
结果如下:
标签:index,name,lateral,Hive,explode,附图片,score,student,class 来源: https://blog.csdn.net/weixin_40983094/article/details/110172155