编程语言
首页 > 编程语言> > Python Pandas-构造多元数据透视表以显示NaN和非NaN的计数

Python Pandas-构造多元数据透视表以显示NaN和非NaN的计数

作者:互联网

我有一个基于不同气象站的数据集,其中包含多个变量(温度,压力等),

stationID | Time | Temperature | Pressure |...
----------+------+-------------+----------+
123       |  1   |     30      |  1010.5  |   
123       |  2   |     31      |  1009.0  |
202       |  1   |     24      |  NaN     |
202       |  2   |     24.3    |  NaN     |
202       |  3   |     NaN     |  1000.3  |
...

并且我想创建一个数据透视表,以显示每个气象站的NaN和非NaN数量,例如:

stationID | nanStatus | Temperature | Pressure |...
----------+-----------+-------------+----------+
123       |  NaN      |      0      |     0    |       
          |  nonNaN   |      2      |     2    |
202       |  NaN      |      1      |     2    |
          |  nonNaN   |      2      |     1    |
...

在下面,我显示了我到目前为止所做的事情,这对于Temperature来说(很麻烦).但是,如何使两个变量都一样,如上所示?

import pandas as pd
import bumpy as np
df = pd.DataFrame({'stationID':[123,123,202,202,202], 'Time':[1,2,1,2,3],'Temperature':[30,31,24,24.3,np.nan],'Pressure':[1010.5,1009.0,np.nan,np.nan,1000.3]})

dfnull = df.isnull()
dfnull['stationID'] = df['stationID']
dfnull['tempValue'] = df['Temperature']
dfnull.pivot_table(values=["tempValue"], index=["stationID","Temperature"], aggfunc=len,fill_value=0)

输出为:

----------------------------------
                         tempValue
stationID | Temperature           
123       | False                2
202       | False                2
          | True                 1

解决方法:

更新:感谢@root

In [16]: df.groupby('stationID')[['Temperature','Pressure']].agg([nans, notnans]).astype(int).stack(level=1)
Out[16]:
                   Temperature  Pressure
stationID
123       nans               0         0
          notnans            2         2
202       nans               1         2
          notnans            2         1

原始答案:

In [12]: %paste
def nans(s):
    return s.isnull().sum()

def notnans(s):
    return s.notnull().sum()
## -- End pasted text --

In [37]: df.groupby('stationID')[['Temperature','Pressure']].agg([nans, notnans]).astype(np.int8)
Out[37]:
          Temperature         Pressure
                 nans notnans     nans notnans
stationID
123                 0       2        0       2
202                 1       2        2       1

标签:pandas,dataframe,nan,pivot-table,python
来源: https://codeday.me/bug/20191118/2027620.html