python-如何在给定列值的函数中对列进行分组和排序
作者:互联网
我有一个如下数据框,我需要编写一个函数,该函数应该能够为我提供以下结果:
输入参数:
>国家/地区,例如“ INDIA”
>年龄,例如“学生”
我的输入数据框如下所示:
Card Name Country Age Code Amount
0 AAA INDIA Young House 100
1 AAA Australia Old Hardware 200
2 AAA INDIA Student House 300
3 AAA US Young Hardware 600
4 AAA INDIA Student Electricity 200
5 BBB Australia Young Electricity 100
6 BBB INDIA Student Electricity 200
7 BBB Australia Young House 450
8 BBB INDIA Old House 150
9 CCC Australia Old Hardware 200
10 CCC Australia Young House 350
11 CCC INDIA Old Electricity 400
12 CCC US Young House 200
预期的输出将是
Code Total Amount Frequency Average
0 Electricity 400 2 200
1 House 300 1 300
给定国家(=印度)和年龄(=学生)的前10名(在我们的情况下,我们只能获得前2名)代码,具体取决于金额的总和.此外,它还应在新列“ Frequency”(频率)中添加计数.该组和“平均”列中的记录总数将是总和/频率
我努力了
df.groupby(['Country','Age','Code']).agg({'Amount': sum})['Amount'].groupby(level=0, group_keys=False).nlargest(10)
产生
Country Age Code
Australia Young House 800
Old Hardware 400
Young Electricity 100
INDIA Old Electricity 400
Student Electricity 400
House 300
Old House 150
Young House 100
US Young Hardware 600
House 200
Name: Amount, dtype: int64
不幸的是,这与预期的输出不同.
解决方法:
给定
>>> df
Card Name Country Age Code Amount
0 AAA INDIA Young House 100
1 AAA Australia Old Hardware 200
2 AAA INDIA Student House 300
3 AAA US Young Hardware 600
4 AAA INDIA Student Electricity 200
5 BBB Australia Young Electricity 100
6 BBB INDIA Student Electricity 200
7 BBB Australia Young House 450
8 BBB INDIA Old House 150
9 CCC Australia Old Hardware 200
10 CCC Australia Young House 350
11 CCC INDIA Old Electricity 400
12 CCC US Young House 200
您可以先过滤数据框:
>>> country = 'INDIA'
>>> age = 'Student'
>>> tmp = df[df.Country.eq(country) & df.Age.eq(age)].loc[:, ['Code', 'Amount']]
>>> tmp
Code Amount
2 House 300
4 Electricity 200
6 Electricity 200
…然后分组:
>>> result = tmp.groupby('Code')['Amount'].agg([['Total Amount', 'sum'], ['Frequency', 'size'], ['Average', 'mean']]).reset_index()
>>> result
Code Total Amount Frequency Average
0 Electricity 400 2 200
1 House 300 1 300
如果我正确理解按总金额过滤的条件,则可以发出
result.nlargest(10, 'Total Amount')
标签:pandas-groupby,pandas,dataframe,python 来源: https://codeday.me/bug/20191108/2007226.html