首页 > 编程语言> > 如何使用Pandas在Python中取得多年平均值

如何使用Pandas在Python中取得多年平均值

2019-10-28 02:57:44 作者：互联网

我有一个庞大的数据集,其中包含来自80年来多个位置(经纬度)的数据.我正在尝试计算整个时间范围内每个站点的a列和b列的10年平均值.以下是数据表的示例.

     Lat       Long Year Month Day      a      b
46.90625 -115.46875 1950    01  01 0.0000 1.1335
46.90625 -115.46875 1950    01  02 0.0000 1.1276 
46.90625 -115.46875 1950    01  03 0.0000 1.1213

这是我尝试过的示例,但不断迷失方向.

fname = output1
df = pandas.read_table(output1)  
lat_long_group = df.groupby(['Lat','Long','Year']).agg(['mean','count'])
monthly_average = lat_long_group.aggregate({'a':numpy.mean,
                                            'b': numpy.mean})

解决方法:

首先,根据熊猫时间戳记创建一列：

df = df.dropna()
df['date'] = df.apply(lambda x: pd.Timestamp('{year}-{month}-{day}'
                                .format(year=int(x.Year), 
                                        month=int(x.Month), 
                                        day=int(x.Day))), 
                      axis=1)

接下来,根据纬度和经度的元组对设置位置.

df['Location'] = zip(df.Lat, df.Long)

现在,删除冗余数据.

df.drop(['Year', 'Month', 'Day', 'Lat', 'Long'], axis=1, inplace=True)

现在,我们可以按日期和位置旋转数据.现在,新的DataFrame会在以下日期建立索引：

df2 = df.pivot(index='date', columns='Location')

交换新列的级别(使位置位于值的顶部).

df2.columns = df2.columns.swaplevel('Location', None)

最后,使用resample来获取十年期间数据的平均值：

>>> df2.resample('10A', how='mean')  # 'A'=Annual, '10A'=TenYears
Location    (46.90625, -115.46875)          
                                 a         b
date                                        
1950-12-31                       0  1.127484
1960-12-31                       0  1.127467
1970-12-31                       0  1.127467
1980-12-31                       0  1.127467
1990-12-31                       0  1.127467
2000-12-31                       0  1.127467
2010-12-31                       0  1.127467
2020-12-31                       0  1.127467
2030-12-31                       0  1.127467
2040-12-31                       0  1.127452

我对3万行使用了相同的数据(当然,除了日期以外),但是您可以看到该过程如何工作.

请注意,数据被分成甚至十年的块,因此您的数据两端可能都有存根(例如,如果您的数据始于1947年,则第一个周期只有3-4年.

标签：pandas,time-series,python,numpy
来源： https://codeday.me/bug/20191028/1949376.html