首页 > 其他分享> > hive实战12--时间占比和上一小时逻辑

hive实战12--时间占比和上一小时逻辑

2021-06-27 19:32:41 作者：互联网

一：已知表结构和字段

二：需求1：时间占比

三：需求2--目前小时与上一小时比较

四：使用窗口函数lag（）解决需求三

五：只存储电脑每天最后一次更新的时间对应的信息

一：已知表结构和字段

某电脑售卖数据表computer,从左至右依次为location，店铺ID，某天，某小时，某分钟，商品是否到期，商品的存量，商品卖出的数量,商品进货数量/

location

pt_day

pt_hour

pt_minute

good_computer

computer_store

computer_out

computer_in

update_time

0到23

0到59

1代表正常

0代表坏

存量

销售量

进货量

二：需求1：时间占比

查询北京东城区的所有店铺在2021-01-11全天缺正常电脑时段占比，全天（除去凌晨0:00-6:00）缺正常电脑时段占比，上午（07:30-10:00）缺正常电脑占比。

提示：分子/分母

分子：每个店铺id在某天的正常电脑数量数为0时的分钟数有多少个（eg. 全天24h内，有X个分钟时段的存量车数为0，则分子为X）分母：全天总分钟数（24 * 60）

思路解析：

7:30--10:00如何写逻辑：方法1: 分情况，当pt_hour = 7的时候再进一步判断pt_minute的大小；方法2:把小时*60换算成分钟再加上分钟的数据形成一个数，然后再进行判断。
统计中有多个指标，且约束互补相同，使用case when进行统计。
sum(case when ... then 1 else 0 end)和count(case when ... then 1 else 0 end)的区别：sum（case when ..）才是正确的用法，他会把case when中为1的累加起来，而count(case when..)不管case when的情况是1还是0都会被count一次，所以不起作用。
注意有很对关键字在命名或者select的时候要注意：例如minute, all这些字段。

代码：注意表里面的粒度都已经到分钟了。

select id ,
       count(1)/(24*60) as a1,   -- 24*60这个整体用括号括起来，细心一点
       sum(case when(pt_hour>=0 and pt_hour<6) then 1 else 0 end)/(6*60) as a2,
       sum(case when pt_hour*7+pt_minute > 7*60+30 and pt_hour<10 then 1 else 0 end) as a3
       -- sum(case when (pt_hour = 7 and pt_minute <30) or (pt_hour >=8 and pt_hour<10) then 1 else 0 end ) as a3
from computer
where pt_day = '2021-01-11'
    and good_computer = 1
    and location = "北京市东城区"
group by id
order by id

三：需求2--目前小时与上一小时比较

查询北京东城区的所有店铺在2021-01-11全天每个店铺，以每1小时为分时段的销售正常电脑量，进货正常电脑量，存量，现在时段销售订单/上一分钟时段存量车数。

思路解析：

当前小时和上一小时的逻辑
当天的凌晨第一个小时的上一个小时要去前一天的第23小时以后取。所以我们可以单独考虑，最后使用union all即可。
表和自身表做连接，千万要注意：每个表都要用时间去约束，千万不要取全量，不仅速度满，结果还是错误的。

代码：

select c5.id, 
       c5.computer_out,
       c5.computer_in,
       c5.computer_store,
       c5.ration
(
select c1.id as id,
     c1.computer_out as computer_out, 
     c1.computer_in as computer_in , 
     c1.computer_store as computer_store, 
     c1.computer_out / c2.store as ration
from computer c1
left outer join computer c2
on c1.id = c2.id
where c1.pt_day = '2021-01-11'
   and c2.pt_day = '2021-01-11'   -- 此条逻辑别忘了
   and c1.pt_hour= c2.pt.hour+1
   and location = ' 北京市东城区'
   
union all

select c3.id as id,
     c3.computer_out as computer_out, 
     c3.computer_in as computer_in , 
     c3.computer_store as computer_store, 
     c3.computer_out / c4.store as ration
from computer c3
left outer join computer c4
on c3.id = c4.id
where c3.pt_day = '2021-01-11'
   and c4.pt_day = '2021-01-10'   -- 此条逻辑别忘了
   and c3.pt_hour= 0
   and c4.pt_hour = 23
   and c3.location = ' 北京市东城区'
   and c4.location = '北京市东城区' --此条也需要吧！！
) c5
group by c5.id 
order by c5.id, c5.pt_hour;

四：使用窗口函数lag（）解决需求三

ing

五：只存储电脑每天最后一次更新的时间对应的信息

思路：

最后一次更新如何求：把更新时间排序一下，注意从大到小排序，然后外层循环中每次取第一名。

select 
food_id,
otehr,
update_time,
dt
from
(
	select
	id,
	row_number() over(partition by food_id order by update_time desc) as r,
    other,
	update_time,
	dt
	from computer
	where pt_day = '2021-01-11'
) temp_table 
where r = 1;

标签：12,hour,pt,--,hive,computer,c3,id,c5
来源： https://blog.csdn.net/yezonghui/article/details/118275329