其他分享
首页 > 其他分享> > 大数据实战(三十四):电商数仓(二十七)之用户行为数据仓库(十三)用户留存主题

大数据实战(三十四):电商数仓(二十七)之用户行为数据仓库(十三)用户留存主题

作者:互联网

1 需求目标

1.1 用户留存概念

 

 

 

1.2 需求描述

用户留存分析

 

 

 

2 DWS层

2.1 DWS层(每日留存用户明细表

 

 

1)建表语句

 

hive (gmall)>
drop table if exists dws_user_retention_day;
create external table dws_user_retention_day 
(
    `mid_id` string COMMENT '设备唯一标识',
    `user_id` string COMMENT '用户标识', 
    `version_code` string COMMENT '程序版本号', 
    `version_name` string COMMENT '程序版本名', 
    `lang` string COMMENT '系统语言', 
    `source` string COMMENT '渠道号', 
    `os` string COMMENT '安卓系统版本', 
    `area` string COMMENT '区域', 
    `model` string COMMENT '手机型号', 
    `brand` string COMMENT '手机品牌', 
    `sdk_version` string COMMENT 'sdkVersion', 
    `gmail` string COMMENT 'gmail', 
    `height_width` string COMMENT '屏幕宽高',
    `app_time` string COMMENT '客户端日志产生时的时间',
    `network` string COMMENT '网络模式',
    `lng` string COMMENT '经度',
    `lat` string COMMENT '纬度',
   `create_date`    string  comment '设备新增时间',
   `retention_day`  int comment '截止当前日期留存天数'
)  COMMENT '每日用户留存情况'
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dws/dws_user_retention_day/';
View Code

 

2)

=============================留存主题=========================
-----------------------------需求1.DWS层每日留存用户明细表-----------------------

-----------------------------相关表---------------------
dws_new_mid_day:每日的新增用户表
dws_uv_detail_day:日活表
-----------------------------思路-----------------------
明细信息:从dws_uv_detail_day(日活表)取
create_date: 设备的新增日期(哪一天称为新用户的)
从dws_new_mid_day根据mid_id查询
retention_day: 截至到当前日期留存的天数

dt(日活数据的日期)=create_date+retention_day
-----------------------------SQL------------------------
--求1日留存

-- 先过滤,再关联比较好
SELECT
t1.mid_id,
t1.user_id,
t1.version_code,
t1.version_name,
t1.lang,
t1.source,
t1.os,
t1.area,
t1.model,
t1.brand,
t1.sdk_version,
t1.gmail,
t1.height_width,
t1.app_time,
t1.network,
t1.lng,
t1.lat,
t2.create_date,
1 retention_day,
'2020-02-15'
FROM
(SELECT * from gmall.dws_uv_detail_day where dt='2020-02-15') t1
JOIN
(select mid_id,create_date from gmall.dws_new_mid_day where create_date=date_sub('2020-02-15',1)) t2
on t1.mid_id=t2.mid_id


----------------------求1,2,3,n天的留存明细----------------------------
insert overwrite TABLE dws_user_retention_day PARTITION(dt='2020-02-15')
SELECT
t1.mid_id,
t1.user_id,
t1.version_code,
t1.version_name,
t1.lang,
t1.source,
t1.os,
t1.area,
t1.model,
t1.brand,
t1.sdk_version,
t1.gmail,
t1.height_width,
t1.app_time,
t1.network,
t1.lng,
t1.lat,
t2.create_date,
1 retention_day
FROM
(SELECT * from gmall.dws_uv_detail_day where dt='2020-02-15') t1
JOIN
(select mid_id,create_date from gmall.dws_new_mid_day where create_date=date_sub('2020-02-15',1)) t2
on t1.mid_id=t2.mid_id
UNION all
SELECT
t1.mid_id,
t1.user_id,
t1.version_code,
t1.version_name,
t1.lang,
t1.source,
t1.os,
t1.area,
t1.model,
t1.brand,
t1.sdk_version,
t1.gmail,
t1.height_width,
t1.app_time,
t1.network,
t1.lng,
t1.lat,
t2.create_date,
2 retention_day
FROM
(SELECT * from gmall.dws_uv_detail_day where dt='2020-02-15') t1
JOIN
(select mid_id,create_date from gmall.dws_new_mid_day where create_date=date_sub('2020-02-15',2)) t2
on t1.mid_id=t2.mid_id
UNION all
SELECT
t1.mid_id,
t1.user_id,
t1.version_code,
t1.version_name,
t1.lang,
t1.source,
t1.os,
t1.area,
t1.model,
t1.brand,
t1.sdk_version,
t1.gmail,
t1.height_width,
t1.app_time,
t1.network,
t1.lng,
t1.lat,
t2.create_date,
3 retention_day
FROM
(SELECT * from gmall.dws_uv_detail_day where dt='2020-02-15') t1
JOIN
(select mid_id,create_date from gmall.dws_new_mid_day where create_date=date_sub('2020-02-15',3)) t2
on t1.mid_id=t2.mid_id

--union all在使用时要求拼接的SQL,字段数量和类型需要一致!
--union all和union区别,union去重,union all不去重!

 

 

 

2.3 UnionUnion all区别

 

1)准备两张表

 

tableA                   tableB

 

id  name  score             id  name  score

 

1   a    80               1    d    48

 

2    b    79               2   e    23

 

3    c     68               3   c    86

 

2)采用union查询

 

select name from tableA             

 

union                        

 

select name from tableB             

 

查询结果

 

name

 

a
d
b
e
c

 

3)采用union all查询

 

select name from tableA
union all
select name from tableB

 

查询结果

 

name

 

a
b
c
d
e
c

 

4)总结

 

(1)union会将联合的结果集去重,效率较union all差

 

(2)union all不会对结果集去重,所以效率高

 

 

 

3 ADS层

 

3.1 留存用户数

1)建表语句

 

hive (gmall)>
drop table if exists ads_user_retention_day_count;
create external table ads_user_retention_day_count 
(
   `create_date`       string  comment '设备新增日期',
   `retention_day`     int comment '截止当前日期留存天数',
   `retention_count`    bigint comment  '留存数量'
)  COMMENT '每日用户留存情况'
row format delimited fields terminated by '\t'
location '/warehouse/gmall/ads/ads_user_retention_day_count/';
View Code

 

2)

-----------------------------需求2.统计ads_user_retention_day_count每日留存用户的数量-----------------------
-----------------------------相关表---------------------
dws_user_retention_day
-----------------------------思路-----------------------
create_date: 从dws_user_retention_day查询
retention_day: 从dws_user_retention_day查询
retention_count: 使用count(*)统计

先根据create_date过滤指定的新增日期日期用户的设备明细!
再根据retention_day分组,之后count(*)
-----------------------------SQL------------------------
insert into table gmall.ads_user_retention_day_count
SELECT
'2020-02-14',
retention_day,
count(*)
FROM gmall.dws_user_retention_day
where create_date='2020-02-14'
group by retention_day;

3.2 留存用户比率

1)建表语句

 

hive (gmall)>
drop table if exists ads_user_retention_day_rate;
create external table ads_user_retention_day_rate 
(
     `stat_date`          string comment '统计日期',
     `create_date`       string  comment '设备新增日期',
     `retention_day`     int comment '截止当前日期留存天数',
     `retention_count`    bigint comment  '留存数量',
     `new_mid_count`     bigint comment '当日设备新增数量',
     `retention_ratio`   decimal(10,2) comment '留存率'
)  COMMENT '每日用户留存情况'
row format delimited fields terminated by '\t'
location '/warehouse/gmall/ads/ads_user_retention_day_rate/';
View Code

 

2)

-----------------------------需求3. 求留存率-----------------------
-----------------------------相关表---------------------
ads_user_retention_day_count
ads_new_mid_count

从以上两表取出同一条新增的设备的信息,因此设备的新增日期是关联的字段
-----------------------------思路-----------------------
`stat_date` : 一般是当前要统计数据的当天或后一天。不早于统计数据的日期!
`create_date` : 从ads_user_retention_day_count取
`retention_day` : 从ads_user_retention_day_count取
`retention_count` : 从ads_user_retention_day_count取
`new_mid_count` : 从ads_new_mid_count统计当前新增设备的数量
`retention_ratio` : retention_count/new_mid_count

-----------------------------SQL------------------------
insert into table ads_user_retention_day_rate
SELECT
'2020-02-16',
ur.create_date,
ur.retention_day,
ur.retention_count,
nm.new_mid_count,
cast (ur.retention_count / nm.new_mid_count as decimal(10,2))
FROM
ads_user_retention_day_count ur
JOIN
ads_new_mid_count nm
on ur.create_date=nm.create_date
where date_add(ur.create_date,ur.retention_day)='2020-02-16'

 

标签:商数,mid,create,数据仓库,用户,t1,date,day,retention
来源: https://www.cnblogs.com/qiu-hua/p/13542762.html