Python Pandas自合并以合并笛卡尔积,以产生所有组合和总和
作者:互联网
我是Python的新手,似乎它具有很大的灵活性,并且比传统的RDBMS系统快.
建立一个非常简单的过程以创建随机的幻想团队.我来自RDBMS背景(Oracle SQL),对于这种数据处理来说似乎并不是最佳选择.
我使用从csv文件读取的熊猫制作了一个数据框,现在有一个包含两列的简单数据框-Player,Salary:
` Name Salary
0 Jason Day 11700
1 Dustin Johnson 11600
2 Rory McIlroy 11400
3 Jordan Spieth 11100
4 Henrik Stenson 10500
5 Phil Mickelson 10200
6 Justin Rose 9800
7 Adam Scott 9600
8 Sergio Garcia 9400
9 Rickie Fowler 9200`
我想通过python(pandas)做的是产生6位玩家的所有组合,其薪水在一定的范围45000-50000之间.
在查找python选项时,我发现itertools组合很有趣,但是它会生成大量的组合列表,而不会过滤薪水总和.
在传统的SQL中,我将使用SUM进行大规模的合并笛卡尔联接,但随后又使播放器处于不同位置.
如A,B,C然后,C,B,A.
我的传统SQL表现不佳,如下所示:
` SELECT distinct
ONE.name AS "1",
TWO.name AS "2",
THREE.name AS "3",
FOUR.name AS "4",
FIVE.name AS "5",
SIX.name AS "6",
sum(one.salary + two.salary + three.salary + four.salary + five.salary + six.salary) as salary
FROM
nl.pgachamp2 ONE,nl.pgachamp2 TWO,nl.pgachamp2 THREE, nl.pgachamp2 FOUR,nl.pgachamp2 FIVE,nl.pgachamp2 SIX
where ONE.name != TWO.name
and ONE.name != THREE.name
and one.name != four.name
and one.name != five.name
and TWO.name != THREE.name
and TWO.name != four.name
and two.name != five.name
and TWO.name != six.name
and THREE.name != four.name
and THREE.name != five.name
and three.name != six.name
and five.name != six.name
and four.name != six.name
and four.name != five.name
and one.name != six.name
group by ONE.name, TWO.name, THREE.name, FOUR.name, FIVE.name, SIX.name`
有没有办法在Pandas / Python中做到这一点?
可以指向的任何文档都很棒!
解决方法:
这是使用简单算法的非熊猫解决方案.它从按薪水排序的球员列表中递归生成组合.这样就可以跳过生成超过薪资上限的组合.
正如piRSquared所提到的,在问题中所述的薪水限制内没有6人的球队,因此我选择了限制以产生少量球队.
#!/usr/bin/env python3
''' Limited combinations
Generate combinations of players whose combined salaries fall within given limits
See https://stackoverflow.com/q/38636460/4014959
Written by PM 2Ring 2016.07.28
'''
data = '''\
0 Jason Day 11700
1 Dustin Johnson 11600
2 Rory McIlroy 11400
3 Jordan Spieth 11100
4 Henrik Stenson 10500
5 Phil Mickelson 10200
6 Justin Rose 9800
7 Adam Scott 9600
8 Sergio Garcia 9400
9 Rickie Fowler 9200
'''
data = [s.split() for s in data.splitlines()]
all_players = [(' '.join(u[1:-1]), int(u[-1])) for u in data]
all_players.sort(key=lambda t: t[1])
for i, row in enumerate(all_players):
print(i, row)
print('- '*40)
def choose_teams(free, num, team=(), value=0):
num -= 1
for i, p in enumerate(free):
salary = all_players[p][1]
newvalue = value + salary
if newvalue <= hi:
newteam = team + (p,)
if num == 0:
if newvalue >= lo:
yield newteam, newvalue
else:
yield from choose_teams(free[i+1:], num, newteam, newvalue)
else:
break
#Salary limits
lo, hi = 55000, 60500
#Indices of players that can be chosen for a team
free = tuple(range(len(all_players)))
for i, (t, s) in enumerate(choose_teams(free, 6), 1):
team = [all_players[p] for p in t]
names, sals = zip(*team)
assert sum(sals) == s
print(i, t, names, s)
输出
0 ('Rickie Fowler', 9200)
1 ('Sergio Garcia', 9400)
2 ('Adam Scott', 9600)
3 ('Justin Rose', 9800)
4 ('Phil Mickelson', 10200)
5 ('Henrik Stenson', 10500)
6 ('Jordan Spieth', 11100)
7 ('Rory McIlroy', 11400)
8 ('Dustin Johnson', 11600)
9 ('Jason Day', 11700)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1 (0, 1, 2, 3, 4, 5) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Phil Mickelson', 'Henrik Stenson') 58700
2 (0, 1, 2, 3, 4, 6) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Phil Mickelson', 'Jordan Spieth') 59300
3 (0, 1, 2, 3, 4, 7) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Phil Mickelson', 'Rory McIlroy') 59600
4 (0, 1, 2, 3, 4, 8) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Phil Mickelson', 'Dustin Johnson') 59800
5 (0, 1, 2, 3, 4, 9) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Phil Mickelson', 'Jason Day') 59900
6 (0, 1, 2, 3, 5, 6) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Henrik Stenson', 'Jordan Spieth') 59600
7 (0, 1, 2, 3, 5, 7) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Henrik Stenson', 'Rory McIlroy') 59900
8 (0, 1, 2, 3, 5, 8) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Henrik Stenson', 'Dustin Johnson') 60100
9 (0, 1, 2, 3, 5, 9) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Henrik Stenson', 'Jason Day') 60200
10 (0, 1, 2, 3, 6, 7) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Jordan Spieth', 'Rory McIlroy') 60500
11 (0, 1, 2, 4, 5, 6) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Phil Mickelson', 'Henrik Stenson', 'Jordan Spieth') 60000
12 (0, 1, 2, 4, 5, 7) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Phil Mickelson', 'Henrik Stenson', 'Rory McIlroy') 60300
13 (0, 1, 2, 4, 5, 8) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Phil Mickelson', 'Henrik Stenson', 'Dustin Johnson') 60500
14 (0, 1, 3, 4, 5, 6) ('Rickie Fowler', 'Sergio Garcia', 'Justin Rose', 'Phil Mickelson', 'Henrik Stenson', 'Jordan Spieth') 60200
15 (0, 1, 3, 4, 5, 7) ('Rickie Fowler', 'Sergio Garcia', 'Justin Rose', 'Phil Mickelson', 'Henrik Stenson', 'Rory McIlroy') 60500
16 (0, 2, 3, 4, 5, 6) ('Rickie Fowler', 'Adam Scott', 'Justin Rose', 'Phil Mickelson', 'Henrik Stenson', 'Jordan Spieth') 60400
如果您使用的旧版Python不支持语法产生的收益,则可以替换
yield from choose_teams(free[i+1:], num, newteam, newvalue)
与
for t, v in choose_teams(free[i+1:], num, newteam, newvalue):
yield t, v
标签:pandas,python-2-7,linear-programming,python 来源: https://codeday.me/bug/20191026/1939487.html