编程语言
首页 > 编程语言> > python-如何在给定列值的函数中对列进行分组和排序

python-如何在给定列值的函数中对列进行分组和排序

作者:互联网

我有一个如下数据框,我需要编写一个函数,该函数应该能够为我提供以下结果:

输入参数:

>国家/地区,例如“ INDIA”
>年龄,例如“学生”

我的输入数据框如下所示:

   Card Name    Country      Age         Code  Amount
0        AAA      INDIA    Young        House     100
1        AAA  Australia      Old     Hardware     200
2        AAA      INDIA  Student        House     300
3        AAA         US    Young     Hardware     600
4        AAA      INDIA  Student  Electricity     200
5        BBB  Australia    Young  Electricity     100
6        BBB      INDIA  Student  Electricity     200
7        BBB  Australia    Young        House     450
8        BBB      INDIA      Old        House     150
9        CCC  Australia      Old     Hardware     200
10       CCC  Australia    Young        House     350
11       CCC      INDIA      Old  Electricity     400
12       CCC         US    Young        House     200

预期的输出将是

          Code  Total Amount  Frequency  Average
0  Electricity           400          2      200
1        House           300          1      300

给定国家(=印度)和年龄(=学生)的前10名(在我们的情况下,我们只能获得前2名)代码,具体取决于金额的总和.此外,它还应在新列“ Frequency”(频率)中添加计数.该组和“平均”列中的记录总数将是总和/频率

我努力了

df.groupby(['Country','Age','Code']).agg({'Amount': sum})['Amount'].groupby(level=0, group_keys=False).nlargest(10)

产生

Country    Age      Code       
Australia  Young    House          800
           Old      Hardware       400
           Young    Electricity    100
INDIA      Old      Electricity    400
           Student  Electricity    400
                    House          300
           Old      House          150
           Young    House          100
US         Young    Hardware       600
                    House          200
Name: Amount, dtype: int64

不幸的是,这与预期的输出不同.

解决方法:

给定

>>> df                                                                                                                 
   Card Name    Country      Age         Code  Amount
0        AAA      INDIA    Young        House     100
1        AAA  Australia      Old     Hardware     200
2        AAA      INDIA  Student        House     300
3        AAA         US    Young     Hardware     600
4        AAA      INDIA  Student  Electricity     200
5        BBB  Australia    Young  Electricity     100
6        BBB      INDIA  Student  Electricity     200
7        BBB  Australia    Young        House     450
8        BBB      INDIA      Old        House     150
9        CCC  Australia      Old     Hardware     200
10       CCC  Australia    Young        House     350
11       CCC      INDIA      Old  Electricity     400
12       CCC         US    Young        House     200

您可以先过滤数据框:

>>> country = 'INDIA'                                                                                                  
>>> age = 'Student'                                                                                                    
>>> tmp = df[df.Country.eq(country) & df.Age.eq(age)].loc[:, ['Code', 'Amount']]                                       
>>> tmp                                                                                                                
          Code  Amount
2        House     300
4  Electricity     200
6  Electricity     200

…然后分组:

>>> result = tmp.groupby('Code')['Amount'].agg([['Total Amount', 'sum'], ['Frequency', 'size'], ['Average', 'mean']]).reset_index() 
>>> result                             
          Code  Total Amount  Frequency  Average
0  Electricity           400          2      200
1        House           300          1      300

如果我正确理解按总金额过滤的条件,则可以发出

result.nlargest(10, 'Total Amount')

标签:pandas-groupby,pandas,dataframe,python
来源: https://codeday.me/bug/20191108/2007226.html