【pandas】按照数据列中元素出现的先后顺序进行分组排列(最后一种个人原创)
作者:互联网
部分原文:使用Pandas完成data列数据处理,按照数据列中元素出现的先后顺序进行分组排列 (qq.com)
import pandas as pd
df = pd.DataFrame({
'data': ['A1', 'D3', 'B2', 'C4', 'A1', 'A2', 'B2', 'B3', 'C3', 'C4', 'D5', 'D3'],
'new': ['A1', 'A1', 'D3', 'D3', 'B2', 'B2', 'C4', 'C4', 'A2', 'B3', 'C3', 'D5']})
df
# new列为data列分组排序后的结果
# 方法一 df = pd.DataFrame({'data':['A1', 'D3', 'B2', 'C4', 'A1', 'A2', 'B2', 'B3', 'C3', 'C4', 'D5', 'D3']}) temp = df.drop_duplicates().reset_index(drop=True).values new_data = [] length = temp.shape[0] for i in range(length): item = temp[i][0] list1 = df['data'].values.tolist() count = list1.count(item) new_data += [item] * count df['new1'] = new_data df
略显啰嗦,不推荐
# 方法二 from collections import Counter from itertools import chain df['new2'] = sum([[k]*v for k, v in Counter(df['data']).items()], []) df['new3'] = [*chain(*([k]*v for k, v in Counter(df['data']).items()))] df
需要引入其他库
# 方法3 df['new4'] = df['data'].unique().repeat(df['data'].value_counts(sort=True)) # 改成True df
# 自带方法,sort改成True,原链接文章错误False
# 方法4 df['new5'] = df['data'].astype('category').cat.reorder_categories(df['data'].unique()).sort_values().values df['new6'] = sorted(df['data'].tolist(), key=df['data'].tolist().index) df
# 个人原创 df['new7'] = sum([[i]*df.data.value_counts()[i] for i in df.data.drop_duplicates()],[])
综合了方法1和方法2
标签:先后顺序,df,列中,D3,A1,B2,data,pandas,C4 来源: https://www.cnblogs.com/hightech/p/16334118.html