python – 计算pandas DataFrame中列对的减法
作者:互联网
我使用大尺寸(48K行,最多数十列)DataFrames.在他们操纵的某个时刻,我需要对列值进行成对减法,我想知道是否有更有效的方法来做到这一点而不是我正在做的那个(见下文).
我目前的代码:
# Matrix is the pandas DataFrame containing all the data
comparison_df = pandas.DataFrame(index=matrix.index)
combinations = itertools.product(group1, group2)
for observed, reference in combinations:
observed_data = matrix[observed]
reference_data = matrix[reference]
comparison = observed_data - reference_data
name = observed + "_" + reference
comparison_df[name] = comparison
由于数据可能很大(我在排列测试期间也使用这段代码),我很想知道它是否可以进行一些优化.
编辑:根据要求,这是典型数据集的示例
ID A1 A2 A3 B1 B2 B3
Ku8QhfS0n_hIOABXuE 6.343 6.304 6.410 6.287 6.403 6.279
fqPEquJRRlSVSfL.8A 6.752 6.681 6.680 6.677 6.525 6.739
ckiehnugOno9d7vf1Q 6.297 6.248 6.524 6.382 6.316 6.453
x57Vw5B5Fbt5JUnQkI 6.268 6.451 6.379 6.371 6.458 6.333
并且典型的结果是,如果“A”组是group1而“B”group2,对于每个ID行,对于每个列具有对应于所生成的配对的一对(例如,A1_B1,A2_B1,A3_B1 ……)上面,包含每个行ID的减法.
解决方法:
在DataFrame列上使用itertools.combinations()
您可以使用itertools.combinations()
创建列组合,并根据这些对评估减法和新名称:
import pandas as pd
from cStringIO import StringIO
import itertools as iter
matrix = pd.read_csv(StringIO('''ID,A1,A2,A3,B1,B2,B3
Ku8QhfS0n_hIOABXuE,6.343,6.304,6.410,6.287,6.403,6.279
fqPEquJRRlSVSfL.8A,6.752,6.681,6.680,6.677,6.525,6.739
ckiehnugOno9d7vf1Q,6.297,6.248,6.524,6.382,6.316,6.453
x57Vw5B5Fbt5JUnQkI,6.268,6.451,6.379,6.371,6.458,6.333''')).set_index('ID')
print 'Original DataFrame:'
print matrix
print
# Create DataFrame to fill with combinations
comparison_df = pd.DataFrame(index=matrix.index)
# Create combinations of columns
for a, b in iter.combinations(matrix.columns, 2):
# Subtract column combinations
comparison_df['{}_{}'.format(a, b)] = matrix[a] - matrix[b]
print 'Combination DataFrame:'
print comparison_df
Original DataFrame:
A1 A2 A3 B1 B2 B3
ID
Ku8QhfS0n_hIOABXuE 6.343 6.304 6.410 6.287 6.403 6.279
fqPEquJRRlSVSfL.8A 6.752 6.681 6.680 6.677 6.525 6.739
ckiehnugOno9d7vf1Q 6.297 6.248 6.524 6.382 6.316 6.453
x57Vw5B5Fbt5JUnQkI 6.268 6.451 6.379 6.371 6.458 6.333
Combination DataFrame:
A1_A2 A1_A3 A1_B1 A1_B2 A1_B3 A2_A3 A2_B1 A2_B2 A2_B3 A3_B1 A3_B2 A3_B3 B1_B2 B1_B3 B2_B3
ID
Ku8QhfS0n_hIOABXuE 0.039 -0.067 0.056 -0.060 0.064 -0.106 0.017 -0.099 0.025 0.123 0.007 0.131 -0.116 0.008 0.124
fqPEquJRRlSVSfL.8A 0.071 0.072 0.075 0.227 0.013 0.001 0.004 0.156 -0.058 0.003 0.155 -0.059 0.152 -0.062 -0.214
ckiehnugOno9d7vf1Q 0.049 -0.227 -0.085 -0.019 -0.156 -0.276 -0.134 -0.068 -0.205 0.142 0.208 0.071 0.066 -0.071 -0.137
x57Vw5B5Fbt5JUnQkI -0.183 -0.111 -0.103 -0.190 -0.065 0.072 0.080 -0.007 0.118 0.008 -0.079 0.046 -0.087 0.038 0.125
标签:python,pandas,data-analysis 来源: https://codeday.me/bug/20190704/1373369.html