其他分享
首页 > 其他分享> > hive实战12--时间占比和上一小时逻辑

hive实战12--时间占比和上一小时逻辑

作者:互联网

目录

一:已知表结构和字段

二:需求1:时间占比

三:需求2--目前小时与上一小时比较

四:使用窗口函数lag()解决需求三

五:只存储电脑每天最后一次更新的时间对应的信息


一:已知表结构和字段

    某电脑售卖数据表computer,从左至右依次为location,店铺ID,某天,某小时,某分钟,商品是否到期,商品的存量,商品卖出的数量,商品进货数量/

locationIDpt_daypt_hourpt_minutegood_computercomputer_storecomputer_outcomputer_inupdate_time
0到230到59

1代表正常

0代表坏

存量销售量进货量

二:需求1:时间占比

查询北京东城区的所有店铺在2021-01-11全天缺正常电脑时段占比,全天(除去凌晨0:00-6:00)缺正常电脑时段占比,上午(07:30-10:00)缺正常电脑占比。

提示:分子/分母

分子:每个店铺id在某天的正常电脑数量数为0时的分钟数有多少个(eg. 全天24h内,有X个分钟时段的存量车数为0,则分子为X)分母:全天总分钟数(24 * 60)

思路解析:

代码:注意表里面的粒度都已经到分钟了。

select id ,
       count(1)/(24*60) as a1,   -- 24*60这个整体用括号括起来,细心一点
       sum(case when(pt_hour>=0 and pt_hour<6) then 1 else 0 end)/(6*60) as a2,
       sum(case when pt_hour*7+pt_minute > 7*60+30 and pt_hour<10 then 1 else 0 end) as a3
       -- sum(case when (pt_hour = 7 and pt_minute <30) or (pt_hour >=8 and pt_hour<10) then 1 else 0 end ) as a3
from computer
where pt_day = '2021-01-11'
    and good_computer = 1
    and location = "北京市东城区"
group by id
order by id

三:需求2--目前小时与上一小时比较

查询北京东城区的所有店铺在2021-01-11全天每个店铺,以每1小时为分时段的销售正常电脑量,进货正常电脑量,存量,现在时段销售订单/上一分钟时段存量车数。

思路解析:

代码:

select c5.id, 
       c5.computer_out,
       c5.computer_in,
       c5.computer_store,
       c5.ration
(
select c1.id as id,
     c1.computer_out as computer_out, 
     c1.computer_in as computer_in , 
     c1.computer_store as computer_store, 
     c1.computer_out / c2.store as ration
from computer c1
left outer join computer c2
on c1.id = c2.id
where c1.pt_day = '2021-01-11'
   and c2.pt_day = '2021-01-11'   -- 此条逻辑别忘了
   and c1.pt_hour= c2.pt.hour+1
   and location = ' 北京市东城区'
   
union all

select c3.id as id,
     c3.computer_out as computer_out, 
     c3.computer_in as computer_in , 
     c3.computer_store as computer_store, 
     c3.computer_out / c4.store as ration
from computer c3
left outer join computer c4
on c3.id = c4.id
where c3.pt_day = '2021-01-11'
   and c4.pt_day = '2021-01-10'   -- 此条逻辑别忘了
   and c3.pt_hour= 0
   and c4.pt_hour = 23
   and c3.location = ' 北京市东城区'
   and c4.location = '北京市东城区' --此条也需要吧!!
) c5
group by c5.id 
order by c5.id, c5.pt_hour;

四:使用窗口函数lag()解决需求三

ing 

五:只存储电脑每天最后一次更新的时间对应的信息

 思路:

select 
food_id,
otehr,
update_time,
dt
from
(
	select
	id,
	row_number() over(partition by food_id order by update_time desc) as r,
    other,
	update_time,
	dt
	from computer
	where pt_day = '2021-01-11'
) temp_table 
where r = 1;

标签:12,hour,pt,--,hive,computer,c3,id,c5
来源: https://blog.csdn.net/yezonghui/article/details/118275329