大数据实战(三十九):电商数仓(三十二)之用户行为数据仓库(十八)每个用户累计访问次数
作者:互联网
0 每个用户累计访问次数
结果如下
用户 日期 小计 总计
mid1 2019-12-14 10 10
mid1 2019-02-11 12 22
mid2 2019-12-14 15 15
mid2 2019-02-11 12 27
1 DWS层
1.1 建表语句
hive (gmall)> drop table if exists dws_user_total_count_day; create external table dws_user_total_count_day( `mid_id` string COMMENT '设备id', `subtotal` bigint COMMENT '每日登录小计' ) partitioned by(`dt` string) row format delimited fields terminated by '\t' location '/warehouse/gmall/dws/dws_user_total_count_day';View Code
1.2 导入数据
-----------------------------需求9.每个用户累计访问次数-----------------------
向dws_user_total_count_day插入数据
-----------------------------相关表---------------------
dwd_start_log(启动日志表)
-----------------------------思路-----------------------
用户每打开一次应用,就会产生一条启动日志。
从启动日志表查询,根据用户(mid_id)分组,求每个用户产生的
启动日志的总的数量(count)
-----------------------------SQL------------------------
insert overwrite table dws_user_total_count_day PARTITION(dt='2020-02-18')
SELECT
mid_id,
count(*) subtotal
FROM dwd_start_log
where dt='2020-02-18'
GROUP by mid_id;
1.3 数据导入脚本
dws_user_total_count_day.sh
#!/bin/bash if [ -n "$1" ] then do_date=$1 else do_date=$(date -d yesterday +%F) fi echo ===日志日期为$do_date=== sql=" insert overwrite table dws_user_total_count_day PARTITION(dt='$do_date') SELECT mid_id, count(*) subtotal FROM dwd_start_log where dt='$do_date' GROUP by mid_id; " hive -e "$sql"
2 ADS层
2.1 建表语句
drop table if exists ads_user_total_count; create external table ads_user_total_count( `mid_id` string COMMENT '设备id', `subtotal` bigint COMMENT '每日登录小计', `total` bigint COMMENT '登录次数总计' ) partitioned by(`dt` string) row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_user_total_count';View Code
2.2 导入数据
-----------------------------需求 ads层统计用户的累计访问次数-----------------------
-----------------------------相关表---------------------
dws_user_total_count_day
-----------------------------思路-----------------------
从dws_user_total_count_day中取出每个用户每天登录的次数,
再取出每个用户之前每天登录的次数的总和
-----------------------------SQL------------------------
insert overwrite table ads_user_total_count PARTITION(dt='2020-02-18')
SELECT
t1.mid_id,
t1.subtotal,
t2.total
from
(select mid_id,subtotal
from dws_user_total_count_day
where dt='2020-02-18') t1
JOIN
(select mid_id,sum(subtotal) total
FROM dws_user_total_count_day
where dt<='2020-02-18'
GROUP by mid_id) t2
on t1.mid_id=t2.mid_id
2.3 数据导入脚本
ads_user_total_count.sh
#!/bin/bash if [ -n "$1" ] then do_date=$1 else do_date=$(date -d yesterday +%F) fi echo ===日志日期为$do_date=== sql=" use gmall; insert overwrite table ads_user_total_count PARTITION(dt='$do_date') SELECT t1.mid_id, t1.subtotal, t2.total from (select mid_id,subtotal from dws_user_total_count_day where dt='$do_date') t1 JOIN (select mid_id,sum(subtotal) total FROM dws_user_total_count_day where dt<='$do_date' GROUP by mid_id) t2 on t1.mid_id=t2.mid_id " hive -e "$sql"
标签:商数,count,dws,数据仓库,用户,mid,user,total,id 来源: https://www.cnblogs.com/qiu-hua/p/13543260.html