其他分享
首页 > 其他分享> > 大数据实战(三十九):电商数仓(三十二)之用户行为数据仓库(十八)每个用户累计访问次数

大数据实战(三十九):电商数仓(三十二)之用户行为数据仓库(十八)每个用户累计访问次数

作者:互联网

0 每个用户累计访问次数

 

结果如下

 

用户 日期     小计 总计

 

mid1 2019-12-14 10 10

 

mid1 2019-02-11 12 22

 

mid2 2019-12-14 15 15

 

mid2 2019-02-11 12 27

1 DWS层

1.1 建表语句

hive (gmall)>
drop table if exists dws_user_total_count_day;
create external table dws_user_total_count_day( 
    `mid_id` string COMMENT '设备id',
`subtotal` bigint COMMENT '每日登录小计'
)
partitioned by(`dt` string)
row format delimited fields terminated by '\t'
location '/warehouse/gmall/dws/dws_user_total_count_day';
View Code

1.2 导入数据

 

-----------------------------需求9.每个用户累计访问次数-----------------------
向dws_user_total_count_day插入数据
-----------------------------相关表---------------------
dwd_start_log(启动日志表)
-----------------------------思路-----------------------
用户每打开一次应用,就会产生一条启动日志。
从启动日志表查询,根据用户(mid_id)分组,求每个用户产生的
启动日志的总的数量(count)
-----------------------------SQL------------------------
insert overwrite table dws_user_total_count_day PARTITION(dt='2020-02-18')
SELECT
mid_id,
count(*) subtotal
FROM dwd_start_log
where dt='2020-02-18'
GROUP by mid_id;

1.3 数据导入脚本

dws_user_total_count_day.sh

#!/bin/bash
if [ -n "$1" ]
then
     do_date=$1
else
    do_date=$(date -d yesterday +%F)
fi

echo ===日志日期为$do_date===


sql="
insert overwrite table dws_user_total_count_day PARTITION(dt='$do_date')
SELECT 
    mid_id,
    count(*) subtotal
FROM dwd_start_log
where dt='$do_date'
GROUP by mid_id;
"
hive  -e "$sql"

2 ADS层

 

2.1 建表语句

drop table if exists ads_user_total_count;
create external table ads_user_total_count( 
    `mid_id` string COMMENT '设备id',
    `subtotal` bigint COMMENT '每日登录小计',
    `total` bigint COMMENT '登录次数总计'
)
partitioned by(`dt` string)
row format delimited fields terminated by '\t'
location '/warehouse/gmall/ads/ads_user_total_count';
View Code

2.2 导入数据

 

-----------------------------需求 ads层统计用户的累计访问次数-----------------------
-----------------------------相关表---------------------
dws_user_total_count_day
-----------------------------思路-----------------------
从dws_user_total_count_day中取出每个用户每天登录的次数,
再取出每个用户之前每天登录的次数的总和
-----------------------------SQL------------------------
insert overwrite table ads_user_total_count PARTITION(dt='2020-02-18')
SELECT
t1.mid_id,
t1.subtotal,
t2.total
from
(select mid_id,subtotal
from dws_user_total_count_day
where dt='2020-02-18') t1
JOIN
(select mid_id,sum(subtotal) total
FROM dws_user_total_count_day
where dt<='2020-02-18'
GROUP by mid_id) t2
on t1.mid_id=t2.mid_id

2.3 数据导入脚本

ads_user_total_count.sh

#!/bin/bash
if [ -n "$1" ]
then
     do_date=$1
else
    do_date=$(date -d yesterday +%F)
fi

echo ===日志日期为$do_date===


sql="

use gmall;
insert overwrite table ads_user_total_count PARTITION(dt='$do_date')
SELECT
    t1.mid_id,
    t1.subtotal,
    t2.total
from 
(select mid_id,subtotal
from dws_user_total_count_day
where dt='$do_date') t1
JOIN
(select mid_id,sum(subtotal) total
FROM dws_user_total_count_day
where dt<='$do_date'
GROUP by mid_id) t2
on t1.mid_id=t2.mid_id

"
hive  -e "$sql"

 

标签:商数,count,dws,数据仓库,用户,mid,user,total,id
来源: https://www.cnblogs.com/qiu-hua/p/13543260.html