其他分享
首页 > 其他分享> > 33. Pandas计算同比环比指标的3种方法

33. Pandas计算同比环比指标的3种方法

作者:互联网

Pandas计算同比环比指标的3种方法

同比和环比:环比和同比用于描述统计数据的变化情况

演示步骤:
0. 读取连续3年的天气数据

  1. 方法1:pandas.Series.pct_change
  2. 方法2:pandas.Series.shift
  3. 方法3:pandas.Series.diff

pct_change、shift、diff,都实现了跨越多行的数据计算

0. 读取连续3年的天气数据

import pandas as pd
%matplotlib inline
fpath = "./datas/beijing_tianqi/beijing_tianqi_2017-2019.csv"
df = pd.read_csv(fpath, index_col="ymd", parse_dates=True)
df.head(3)
bWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
ymd
2017-01-015℃-3℃霾~晴南风1-2级450严重污染6
2017-01-027℃-6℃晴~霾南风1-2级246重度污染5
2017-01-035℃-5℃南风1-2级320严重污染6
# 替换掉温度的后缀℃
df["bWendu"] = df["bWendu"].str.replace("℃", "").astype('int32')
df.head(3)
bWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
ymd
2017-01-015-3℃霾~晴南风1-2级450严重污染6
2017-01-027-6℃晴~霾南风1-2级246重度污染5
2017-01-035-5℃南风1-2级320严重污染6
# 新的df,为每个月的平均最高温
df = df[["bWendu"]].resample("M").mean()
# 将索引按照日期升序排列
df.sort_index(ascending=True, inplace=True)
df.head()
bWendu
ymd
2017-01-313.322581
2017-02-287.642857
2017-03-3114.129032
2017-04-3023.700000
2017-05-3129.774194
df.index
DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30',
               '2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31',
               '2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31',
               '2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
               '2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31',
               '2018-09-30', '2018-10-31', '2018-11-30', '2018-12-31',
               '2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
               '2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31',
               '2019-09-30', '2019-10-31', '2019-11-30', '2019-12-31'],
              dtype='datetime64[ns]', name='ymd', freq='M')
df.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x13d8d77dc48>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NY4NNjz6-1611064082104)(output_11_1.png)]

方法1:pandas.Series.pct_change

pct_change方法直接算好了"(新-旧)/旧"的百分比

官方文档地址:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.pct_change.html

df["bWendu_way1_huanbi"] = df["bWendu"].pct_change(periods=1)
df["bWendu_way1_tongbi"] = df["bWendu"].pct_change(periods=12)
df.head(15)
bWendubWendu_way1_huanbibWendu_way1_tongbi
ymd
2017-01-313.322581NaNNaN
2017-02-287.6428571.300277NaN
2017-03-3114.1290320.848658NaN
2017-04-3023.7000000.677397NaN
2017-05-3129.7741940.256295NaN
2017-06-3030.9666670.040051NaN
2017-07-3131.6129030.020869NaN
2017-08-3130.129032-0.046939NaN
2017-09-3027.866667-0.075089NaN
2017-10-3117.225806-0.381849NaN
2017-11-309.566667-0.444632NaN
2017-12-314.483871-0.531303NaN
2018-01-311.322581-0.705036-0.601942
2018-02-284.8928572.699477-0.359813
2018-03-3114.1290321.8876850.000000

方法2:pandas.Series.shift

shift用于移动数据,但是保持索引不变

官方文档地址:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.shift.html

# 见识一下shift做了什么事情
# 使用pd.concat合并Series列表变成一个大的df
pd.concat(
    [df["bWendu"], 
     df["bWendu"].shift(periods=1), 
     df["bWendu"].shift(periods=12)],
    axis=1
).head(15)
bWendubWendubWendu
ymd
2017-01-313.322581NaNNaN
2017-02-287.6428573.322581NaN
2017-03-3114.1290327.642857NaN
2017-04-3023.70000014.129032NaN
2017-05-3129.77419423.700000NaN
2017-06-3030.96666729.774194NaN
2017-07-3131.61290330.966667NaN
2017-08-3130.12903231.612903NaN
2017-09-3027.86666730.129032NaN
2017-10-3117.22580627.866667NaN
2017-11-309.56666717.225806NaN
2017-12-314.4838719.566667NaN
2018-01-311.3225814.4838713.322581
2018-02-284.8928571.3225817.642857
2018-03-3114.1290324.89285714.129032
# 环比
series_shift1 = df["bWendu"].shift(periods=1)
df["bWendu_way2_huanbi"] = (df["bWendu"]-series_shift1)/series_shift1

# 同比
series_shift2 = df["bWendu"].shift(periods=12)
df["bWendu_way2_tongbi"] = (df["bWendu"]-series_shift2)/series_shift2
df.head(15)
bWendubWendu_way1_huanbibWendu_way1_tongbibWendu_way2_huanbibWendu_way2_tongbi
ymd
2017-01-313.322581NaNNaNNaNNaN
2017-02-287.6428571.300277NaN1.300277NaN
2017-03-3114.1290320.848658NaN0.848658NaN
2017-04-3023.7000000.677397NaN0.677397NaN
2017-05-3129.7741940.256295NaN0.256295NaN
2017-06-3030.9666670.040051NaN0.040051NaN
2017-07-3131.6129030.020869NaN0.020869NaN
2017-08-3130.129032-0.046939NaN-0.046939NaN
2017-09-3027.866667-0.075089NaN-0.075089NaN
2017-10-3117.225806-0.381849NaN-0.381849NaN
2017-11-309.566667-0.444632NaN-0.444632NaN
2017-12-314.483871-0.531303NaN-0.531303NaN
2018-01-311.322581-0.705036-0.601942-0.705036-0.601942
2018-02-284.8928572.699477-0.3598132.699477-0.359813
2018-03-3114.1290321.8876850.0000001.8876850.000000

方法3. pandas.Series.diff

pandas.Series.diff用于新值减去旧值

官方文档:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.diff.html

pd.concat(
    [df["bWendu"], 
     df["bWendu"].diff(periods=1), 
     df["bWendu"].diff(periods=12)],
    axis=1
).head(15)
bWendubWendubWendu
ymd
2017-01-313.322581NaNNaN
2017-02-287.6428574.320276NaN
2017-03-3114.1290326.486175NaN
2017-04-3023.7000009.570968NaN
2017-05-3129.7741946.074194NaN
2017-06-3030.9666671.192473NaN
2017-07-3131.6129030.646237NaN
2017-08-3130.129032-1.483871NaN
2017-09-3027.866667-2.262366NaN
2017-10-3117.225806-10.640860NaN
2017-11-309.566667-7.659140NaN
2017-12-314.483871-5.082796NaN
2018-01-311.322581-3.161290-2.00
2018-02-284.8928573.570276-2.75
2018-03-3114.1290329.2361750.00
# 环比
series_diff1 = df["bWendu"].diff(periods=1)
df["bWendu_way3_huanbi"] = series_diff1/(df["bWendu"]-series_diff1)

# 同比
series_diff2 = df["bWendu"].diff(periods=12)
df["bWendu_way3_tongbi"] = series_diff2/(df["bWendu"]-series_diff2)
df.head(15)
bWendubWendu_way1_huanbibWendu_way1_tongbibWendu_way2_huanbibWendu_way2_tongbibWendu_way3_huanbibWendu_way3_tongbi
ymd
2017-01-313.322581NaNNaNNaNNaNNaNNaN
2017-02-287.6428571.300277NaN1.300277NaN1.300277NaN
2017-03-3114.1290320.848658NaN0.848658NaN0.848658NaN
2017-04-3023.7000000.677397NaN0.677397NaN0.677397NaN
2017-05-3129.7741940.256295NaN0.256295NaN0.256295NaN
2017-06-3030.9666670.040051NaN0.040051NaN0.040051NaN
2017-07-3131.6129030.020869NaN0.020869NaN0.020869NaN
2017-08-3130.129032-0.046939NaN-0.046939NaN-0.046939NaN
2017-09-3027.866667-0.075089NaN-0.075089NaN-0.075089NaN
2017-10-3117.225806-0.381849NaN-0.381849NaN-0.381849NaN
2017-11-309.566667-0.444632NaN-0.444632NaN-0.444632NaN
2017-12-314.483871-0.531303NaN-0.531303NaN-0.531303NaN
2018-01-311.322581-0.705036-0.601942-0.705036-0.601942-0.705036-0.601942
2018-02-284.8928572.699477-0.3598132.699477-0.3598132.699477-0.359813
2018-03-3114.1290321.8876850.0000001.8876850.0000001.8876850.000000

标签:01,33,31,pandas,df,NaN2017,bWendu,环比,Pandas
来源: https://blog.csdn.net/lvlinjier/article/details/112853154