其他分享
首页 > 其他分享> > 31. Pandas使用explode实现一行变多行统计

31. Pandas使用explode实现一行变多行统计

作者:互联网

Pandas使用explode实现一行变多行统计

解决实际问题:一个字段包含多个值,怎样将这个值拆分成多行,然后实现统计

比如:一个电影有多个分类、一个人有多个喜好,需要按分类、喜好做统计

1、读取数据

import pandas as pd
df = pd.read_csv(
    "./datas/movielens-1m/movies.dat",
    header=None,
    names="MovieID::Title::Genres".split("::"),
    sep="::",
    engine="python"
)
df.head()
MovieIDTitleGenres
01Toy Story (1995)Animation|Children's|Comedy
12Jumanji (1995)Adventure|Children's|Fantasy
23Grumpier Old Men (1995)Comedy|Romance
34Waiting to Exhale (1995)Comedy|Drama
45Father of the Bride Part II (1995)Comedy

问题:怎样实现这样的统计,每个题材有多少部电影?

解决思路:

2、将Genres字段拆分成列表

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3883 entries, 0 to 3882
Data columns (total 3 columns):
MovieID    3883 non-null int64
Title      3883 non-null object
Genres     3883 non-null object
dtypes: int64(1), object(2)
memory usage: 91.1+ KB
# 当前的Genres字段是字符串类型
type(df.iloc[0]["Genres"])
str
# 新增一列
df["Genre"] = df["Genres"].map(lambda x:x.split("|"))
df.head()
MovieIDTitleGenresGenre
01Toy Story (1995)Animation|Children's|Comedy[Animation, Children's, Comedy]
12Jumanji (1995)Adventure|Children's|Fantasy[Adventure, Children's, Fantasy]
23Grumpier Old Men (1995)Comedy|Romance[Comedy, Romance]
34Waiting to Exhale (1995)Comedy|Drama[Comedy, Drama]
45Father of the Bride Part II (1995)Comedy[Comedy]
# Genre的类型是列表
print(df["Genre"][0])
print(type(df["Genre"][0]))
['Animation', "Children's", 'Comedy']
<class 'list'>
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3883 entries, 0 to 3882
Data columns (total 4 columns):
MovieID    3883 non-null int64
Title      3883 non-null object
Genres     3883 non-null object
Genre      3883 non-null object
dtypes: int64(1), object(3)
memory usage: 121.5+ KB

3、使用explode将一行拆分成多行

语法:pandas.DataFrame.explode(column)
将dataframe的一个list-like的元素按行复制,index索引随之复制

df_new = df.explode("Genre")
df_new.head(10)
MovieIDTitleGenresGenre
01Toy Story (1995)Animation|Children's|ComedyAnimation
01Toy Story (1995)Animation|Children's|ComedyChildren's
01Toy Story (1995)Animation|Children's|ComedyComedy
12Jumanji (1995)Adventure|Children's|FantasyAdventure
12Jumanji (1995)Adventure|Children's|FantasyChildren's
12Jumanji (1995)Adventure|Children's|FantasyFantasy
23Grumpier Old Men (1995)Comedy|RomanceComedy
23Grumpier Old Men (1995)Comedy|RomanceRomance
34Waiting to Exhale (1995)Comedy|DramaComedy
34Waiting to Exhale (1995)Comedy|DramaDrama

4、实现拆分后的题材的统计

%matplotlib inline
df_new["Genre"].value_counts().plot.bar()
<matplotlib.axes._subplots.AxesSubplot at 0x23d73917cc8>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hY48gkuV-1611064013392)(output_18_1.png)]


标签:Genres,1995,3883,df,31,Children,explode,Comedy,Pandas
来源: https://blog.csdn.net/lvlinjier/article/details/112853118