AARRR:2.4
作者:互联网
学习来源:
https://blog.csdn.net/qq_22790151/article/details/109700735
https://blog.csdn.net/fei347795790/article/details/98620124
https://zhuanlan.zhihu.com/p/285676746
import pandas as pd
df=pd.read_csv('user_behavior.csv')
df['timestamps']=pd.to_datetime(df['timestamps'],unit='s')
1.购买前十的商品及数目:
(1)mysql:
SELECT *,row_number() over(order by t.`购买次数` desc) as 排序 from
(SELECT item_id,count(*) as 购买次数 from userbehavior
where behavior='buy'
GROUP BY item_id) t
没有使用并列排序,因为购买次数均为1-4
同理可求点击:
SELECT *,row_number() over(order by t.`点击次数` desc) as 排序 from
(SELECT item_id,count(*) as 点击次数 from userbehavior
where behavior='pv'
GROUP BY item_id) t
(2)python
df2=pd.DataFrame(columns=['购买前10类目','购买前10数量'])
pv_count= df[df["behavior"]=='buy']["category_id"].value_counts().head(10)
df2['购买前10类目']=pv_count.index
df2['购买前10数量']=pv_count.values
df2
把category_id改成item_id
同理可求点击:
df1=pd.DataFrame(columns=['点击前10类目','点击前10数量'])
pv_count= df[df["behavior"]=='pv']["category_id"].value_counts().head(10)
df1['点击前10类目']=pv_count.index
df1['点击前10数量']=pv_count.values
2.购买率
#1.日购买行为
day_buy_user_num = df[df.behavior == 'buy'].drop_duplicates(['user_id', 'dates']).groupby('dates')['user_id'].count()
day_active_user_num = df.drop_duplicates(['user_id', 'dates']).groupby('dates')['user_id'].count()
day_buy_rate = day_buy_user_num / day_active_user_num
#2.时购买行为
hour_buy_user_num = df[df.behavior == 'buy'].drop_duplicates(['user_id', 'hour']).groupby('hour')['user_id'].count()
hour_active_user_num = df.drop_duplicates(['user_id', 'hour']).groupby('hour')['user_id'].count()
hour_buy_rate = hour_buy_user_num / hour_active_user_num
3.复购率
(1)mysql
原文方法,结果是:
SELECT count(t.user_id) as 购买人数,
count(case when t.`购买次数`>1 then t.user_id else null end)as 复购人数,
CONCAT(round(100*count(case when t.`购买次数`>1 then t.user_id else null end)/count(t.user_id),2),'%') as 复购率
from
(
SELECT
user_id,count(item_id) as 购买次数
from userbehavior
where behavior='buy'
GROUP BY user_id
)t
;
我的方法是:
select sum(case when t2.`购买次数`>0 then 1 else 0 end)as 购买人数,
sum(case when t2.`购买次数`>1 then 1 else 0 end)as 复购人数,
CONCAT(round(100*sum(case when t2.`购买次数`>1 then 1 else 0 end)/count(t2.user_id),2),'%') as 复购率
from
(
select user_id,count(date1) as '购买次数'
from
(
SELECT
user_id,date1
from userbehavior
where behavior='buy'
GROUP BY user_id,date1
) t1
group by user_id
) t2
原因在于口径的原因,购买次数的计算,我的计算:第一天购买,以后的每一天任一次或多次购买均属于复购,按照日期。作者是根据类别count,购买了A,然后购买了B就算复购。如果同一天上午购买,下午也购买算复购,指标计算也需要更改。
(2)python
方法一:
#计算复购率(9日复购率)
data_user_buy_all =df[df["behavior"]=='buy'].groupby("user_id")["dates"].apply(lambda x:len(x.unique()))
first = data_user_buy_all.count()
again = data_user_buy_all[data_user_buy_all>=2].count()
print("复购率:",format(again/first,".2%"))
方法二:
#计算复购率(9日复购率)
df_rebuy = df[df.behavior == 'buy'].drop_duplicates(['user_id','dates']).groupby('user_id')['dates'].count()
First = df_rebuy.count()
Again = df_rebuy[df_rebuy >= 2].count()
4.RF分析
由于并没有消费金额的记录,因此忽略RFM模型中的M,用购买频率F和最近一次购买时间R来划分顾客。
-- 建立R表,离2017-12-04日期越近,R值越高
-- 建立F表,区间段里购买次数越多,F值越大
df4=df
df4['timestamps']=pd.to_datetime(df4['timestamps'],unit='s')
df4['日期']=df4['timestamps'].dt.strftime('%Y-%m-%d').astype('datetime64[ns]')
df4=df4[(df4['timestamps']>=datetime(2017,11,25,0,0,0))&(df4['timestamps']<datetime(2017,12,4,0,0,0))&(df4['behavior']=='buy')]
#求R值
df4['diff']=(datetime(2017,12,4)-df4['日期']).dt.days
r=df4.groupby('user_id')['diff'].min().reset_index()
#求F值(用户在某日多次购买,只计一次)
f=df4.groupby(['user_id','日期']).count().reset_index().groupby('user_id')['日期'].count().reset_index()
#将表r、表f合并
df5=pd.merge(r,f,on='user_id',how='inner')[['user_id','diff','日期']]
df5.columns=['user_id','r','f']
打分:首先分5类,然后分2类
#基于业务节点打分 (区间左开右闭)
df5['r_score']=pd.cut(df5['r'],bins=[0,1,3,5,7,100],labels=[5,4,3,2,1]).astype(int)
df5['f_score']=pd.cut(df5['f'],bins=[0,1,3,5,7,100],labels=[1,2,3,4,5]).astype(int)
#得分与均值比较
df5['r是否大于平均值']=(df5['r_score']>df5['r_score'].mean())*1 #r平均值3.53
df5['f是否大于平均值']=(df5['f_score']>df5['f_score'].mean())*1 #f平均值1.69
df5['rfm']=10*df5['r是否大于平均值']+1*df5['f是否大于平均值']
然后给对应的类别打上业务标签:
标签:count,buy,df,AARRR,df5,user,id,2.4 来源: https://www.cnblogs.com/djbwxh/p/16632598.html