大数据之sparksql常用函数
作者:互联网
1、桥接+子查询+交叉连接
select
p.project_name as project_name
from
ods_pms_t_project_work_1217 w
left join ods_pms_t_project_user_1217 u on w.user_id = u.id
and w.work_hour > 0
left join (
select
t1.project_id as project_id,
t1.project_name as project_name
from
ods_pms_t_project_1217 t1
) p on p.project_id = u.project_id
cross join ods_pms_t_project_user_1217 u1 on 1=1
2、when ... case语句
SELECT CASE t1.STATUS WHEN 'Y' THEN t1.STATUS ELSE t2.city END
两种写法:
SELECT CASE name WHEN 'shanmin' THEN name when 'zhourong' then concat(name, '-',address) ELSE address END as test111111 from liuyue_test;
SELECT CASE WHEN name='shanmin' THEN name when name='zhourong' then concat(name, '-',address) ELSE address END as test111111 from liuyue_test;
3、split语句
4、多级子查询
select t3.id from (
select t2.id, t2.name,t2.address, t2.band, t2.sex, t2.number, t2.phone_num from (
select t1.id, t1.name,t1.address, t1.band, t1.sex, t1.number, t1.phone_num from
(select * from ods_alm_liuyue_test3) t1) t2) t3;
select t1.name as name2 from ods_alm_liuyue_test3 cross join (select t0.sex from ods_alm_liuyue_test3 t0 where t0.name = 'Zhangsan' ) t1 on 1 = 1
4、join、cross join、inner join、full join是一样的?
5、cast语句:
select cast(cast(t1.id as decimal) as string) as typecast from ods_alm_liuyue_0329_1 t1
8、在会话中设置用户变量:
select @var_name := task_id from de_task_conf where task_name='acc_edw_dim_test_inre_slowly_051404';
之后在当前会话中就能引用var_name
9、往spark表中插入数据,需要mr账号执行权限:
set hive.exec.dynamic.partition.mode=nonstrict;
insert into `zxvmax.edw_dim_test_incre_slowly_0816_backup` PARTITION (`p_date`) values('10000000012','10000000012','1012','zhourong_prod','Y','2019-07-16 10:35:26',
'2019-07-16 10:35:26','ods_alm_liuyue_test_0716','1900-01-01','3010-12-31','2019-07-16');
10、sparksql中使用时间、日期函数,及日期函数加减运算:
select now(),current_timestamp(),date_sub(current_date(),1),date_add(current_date(),1);
11、spark任务使用split函数会运行失败
可以用substr()、concat()替换。
12、在代码编辑中使用系统变量的日期运算
‘$cur_date#-1$’ $[date]-1
13、查看建表语句、表结构
desc table zxvmax.edw_test_change_part_09104;
Show create table zxvmax.edw_test_change_part_09104;
14、insert语句
MaxCompute的insert语法与通常使用的MySQL或Oracle的insert语法有差别,在insert overwrite/into后需要加入table关键字,而非直接使用tablename。insert into与insert overwrite的区别是:insert into会向表或表的分区中追加数据,而insert overwrite会在向表或分区中插入数据前清空表中的原有数据。
15、lead()函数
LEAD(scd_start_date,n=1,default=NULL) OVER ( PARTITION BY id ORDER BY TO_DATE(scd_start_date)) AS scd_end_date
表示将下一行的scd_start_date的值,作为当前行的scd_end_date值,根据id分组,按scd_start_date正序排列
16、Lag()函数
lag(scd_start_date,n=1,default=NULL) OVER ( PARTITION BY id ORDER BY TO_DATE(scd_start_date)) AS scd_end_date
表示将上一行的scd_start_date的值,作为当前行的scd_end_date值,根据id分组,按scd_start_date正序排列
17、Row_num()函数
row_number() OVER (PARTITION BY COL1 ORDER BY COL2) as order_num 表示根据COL1分组,在分组内部根据 COL2排序,而此函数计算的值就表示每组内部排序后的顺序编号(组内连续的唯一的)
18、Explain语句
explain select * from ods_alm_liuyue_test_0905
19、Compute语句
20、compute stats ods_alm_liuyue_test_0905
标签:常用,name,scd,t1,sparksql,date,id,select,函数 来源: https://www.cnblogs.com/yahutiaotiao/p/12631807.html