python – Pandas-拆分多索引数据帧
作者:互联网
我有一个多索引的pandas数据框,我想用“||”分隔值字符并包含一个索引层,其中包含三个新列’Connection’,’Val1”Val2′.
如果有人可以给我一些提示来做这件事会有帮助.
当前示例数据框:
Experiment1 Experiment2
Target Analyze1_ab Analyze2_zz Analyze1_yy
XXX_1 Edge2||3.1E-07||-0.5 Edge2||2.1E-06||-0.9 Edge2||6.4E-02||-0.3
XXX_4 Edge1||6.4E-12||1.1 Edge1||2.4E-11||9.4 Edge1||1.4E-11||1.4
ABC_1 Edge1||3.9E-07||0.7 Edge1||2.9E-07||5.6 Edge1||6.8E-02||0.4
ABC_2 Edge2||1.1E-09||-0.5 Edge2||1.2E-09||1.2 Edge2||1.0E-03||-0.5
ABC_3 Edge2||4.6E-25||-0.8 Edge2||2.6E-10||1.9 Edge2||5.0E-17||-0.9
XXX_2 Edge2||1.7E-07||-0.5 Edge2||5.7E-08||-0.3 Edge2||4.1E-02||-0.3
ABC_4 Edge1||8.1E-02||0.5 Edge1||9.1E-02||1.5 Edge1||5.4E-02||0.6
ABC_5 Edge1||6.7E-02||0.3 Edge1||4.2E-02||1.9 Edge1||5.6E-03||0.4
XXX_3 Edge2||3.1E-03||-0.4 Edge1||2.4E-11||1.1 Edge2||2.4E-02||-0.3
期望的数据框架:
Experiment1 Experiment2
Target Analyze1_ab Analyze2_zz Analyze1_yy
Connection Val1 Val2 Connection Val1 Val2 Connection Val1 Val2
XXX_1 Edge2 3.10E-07 -0.5 Edge2 2.10E-06 -0.9 Edge2 6.40E-02 -0.3
XXX_4 Edge1 6.40E-12 1.1 Edge1 2.40E-11 9.4 Edge1 1.40E-11 1.4
ABC_1 Edge1 3.90E-07 0.7 Edge1 2.90E-07 5.6 Edge1 6.80E-02 0.4
ABC_2 Edge2 1.10E-09 -0.5 Edge2 1.20E-09 1.2 Edge2 1.00E-03 -0.5
ABC_3 Edge2 4.60E-25 -0.8 Edge2 2.60E-10 1.9 Edge2 5.00E-17 -0.9
XXX_2 Edge2 1.70E-07 -0.5 Edge2 5.70E-08 -0.3 Edge2 4.10E-02 -0.3
ABC_4 Edge1 8.10E-02 0.5 Edge1 9.10E-02 1.5 Edge1 5.40E-02 0.6
ABC_5 Edge1 6.70E-02 0.3 Edge1 4.20E-02 1.9 Edge1 5.60E-03 0.4
XXX_3 Edge2 3.10E-03 -0.4 Edge1 2.40E-11 1.1 Edge2 2.40E-02 -0.3
解决方法:
import pandas as pd
# Initialize DataFrame
# -----------------------------------------------------------------------------
df = pd.DataFrame({
'Analyze1_ab': ['Edge2||3.1E-07||-0.5', 'Edge1||6.4E-12||1.1'],
'Analyze2_zz': ['Edge2||2.1E-06||-0.9', 'Edge1||2.4E-11||9.4'],
'Analyze1_yy': ['Edge2||6.4E-02||-0.3', 'Edge1||1.4E-11||1.4'],
'Target': ['XXX_1', 'XXX_4'],})
df.columns = pd.MultiIndex.from_tuples(
[('Experiment1', 'Analyze1_ab'),
('Experiment2', 'Analyze1_yy'),
('Experiment1', 'Analyze2_zz'),
('Target', '')])
# Split 'Analyses' columns by double pipes ||
# -----------------------------------------------------------------------------
# Initialize final DataFrame
final_df = pd.DataFrame()
for col_name in df.columns:
if (col_name[1].startswith('Analyze') and
df[col_name].str.contains('||').all()):
# Split 'Analysis' by || into new columns
splitted_analysis = df[col_name].str.split('\|\|', expand=True)
# The new column names are 0, 1, 2. Let's rename them.
splitted_analysis.columns = ['Connection', 'Val1', 'Val2']
# Recreate MultiIndex
splitted_analysis.columns = pd.MultiIndex.from_tuples(
[(col_name[0], col_name[1], c) for c in splitted_analysis.columns])
# Concatenate the new columns to the final_df
final_df = pd.concat(objs=[final_df, splitted_analysis], axis=1)
# Add 'Target' column in the final_df.
# First, extract it.
target_col = pd.DataFrame(df[('Target', '')])
# Then, increase MultiIndex level of 'Target' from 2 to 3,
# to allow smooth concatenation with the final_df.
target_col.columns = pd.MultiIndex.from_tuples([('Target', '', '')])
final_df = pd.concat([final_df, target_col], axis=1)
验证:print(final_df):
Experiment1 Experiment2 Experiment1 Target
Analyze1_ab Analyze1_yy Analyze2_zz
Connection Val1 Val2 Connection Val1 Val2 Connection Val1 Val2
0 Edge2 3.1E-07 -0.5 Edge2 6.4E-02 -0.3 Edge2 2.1E-06 -0.9 XXX_1
1 Edge1 6.4E-12 1.1 Edge1 1.4E-11 1.4 Edge1 2.4E-11 9.4 XXX_4
验证:pprint.pprint([c for final_df.columns]):
[('Experiment1', 'Analyze1_ab', 'Connection'),
('Experiment1', 'Analyze1_ab', 'Val1'),
('Experiment1', 'Analyze1_ab', 'Val2'),
('Experiment2', 'Analyze1_yy', 'Connection'),
('Experiment2', 'Analyze1_yy', 'Val1'),
('Experiment2', 'Analyze1_yy', 'Val2'),
('Experiment1', 'Analyze2_zz', 'Connection'),
('Experiment1', 'Analyze2_zz', 'Val1'),
('Experiment1', 'Analyze2_zz', 'Val2'),
('Target', '', '')]
标签:python,pandas,multiple-columns 来源: https://codeday.me/bug/20190611/1217502.html