大数据测试4
作者:互联网
5.6 how
5.6.1 原创 VS 改编占比(饼图)
在查看属性列并没有发现有“改编“这一列一时间不知道如何分辨是否是改编。
但是百度以后发现keywords这一属性列中有based on代表是改编。于是问题解决了:
clean_tmdb_5000_movies = "static/data/clean_df_tmdb_5000_movies.csv"
# 显示所有列
pd.set_option('display.max_columns', None)
# 显示所有行
pd.set_option('display.max_rows', None)
# 显示宽度
pd.set_option('display.width', None)
clean_df_tmdb_5000_movies = pd.read_csv(clean_tmdb_5000_movies)
# 创建数据框
original_df = pd.DataFrame()
original_df['keywords'] =clean_df_tmdb_5000_movies['keywords'].str.contains('based on').map(lambda x: 1 if x else 0)
#print(clean_df_tmdb_5000_movies['keywords'].str.contains('based on').map(lambda x: 1 if x else 0))
#contains判断是否有子字符串,返回布尔类型,后面的map是将布尔类型转换为1,0表示
original_df['profit'] = clean_df_tmdb_5000_movies['revenue']#收入
original_df['budget'] = clean_df_tmdb_5000_movies['budget']#预算
# 计算
novel_cnt = original_df['keywords'].sum() # 改编作品数量
original_cnt = original_df['keywords'].count() - original_df['keywords'].sum() # 原创作品数量
# 按照 是否原创 分组
original_df = original_df.groupby('keywords', as_index=False).mean() # 注意此处计算的是利润和预算的平均值
# 增加计数列
original_df['count'] = [original_cnt, novel_cnt]
#print(original_df)
# 计算利润率
original_df['profit_rate'] = (original_df['profit'] / original_df['budget']) * 100
# 修改index
original_df.index = ['original', 'based_on_novel']
# 计算百分比
original_pie = original_df['count'] / original_df['count'].sum()
# 绘制饼图
original_pie.plot(kind='pie', label='', startangle=90, shadow=False, autopct='%2.1f%%', figsize=(8, 8))
plt.title('Original VS Adaptation', fontsize=20)
plt.legend(loc=2, fontsize=10)
plt.savefig('改编VS原创.png', dpi=300)
plt.show()
5.6.2 原创 VS 改编预算/利润率(组合图)
x = original_df.index
y1 = original_df.budget
y2 = original_df.profit_rate
fig = plt.figure(figsize=(8, 6))
# 左轴
ax1 = fig.add_subplot(1, 1, 1)
plt.bar(x, y1, color='b', label='Average budget', width=0.25)
plt.xticks(rotation=0, fontsize=12) # 更改横坐标轴名称
ax1.set_xlabel('Original VS Adaptation') # 设置x轴label ,y轴label
ax1.set_ylabel('Average budget', fontsize=16)
ax1.legend(loc=2, fontsize=10)
# 右轴
# 共享x轴,生成次坐标轴
ax2 = ax1.twinx()
ax2.plot(x, y2, 'ro-.', linewidth=5, label='Average profit margin')
ax2.set_ylabel('Average profit margin', fontsize=16)
ax2.legend(loc=1, fontsize=10) # loc=1,2,3,4分别表示四个角,和四象限顺序一致
# 将利润率坐标轴以百分比格式显示
import matplotlib.ticker as mtick
fmt = '%.1f%%'
yticks = mtick.FormatStrFormatter(fmt)
ax2.yaxis.set_major_formatter(yticks)
plt.savefig('改编VS原创的预算以及利润率.png', dpi=300)
plt.show()
标签:数据测试,plt,df,5000,keywords,tmdb,original 来源: https://www.cnblogs.com/fengchuiguobanxia/p/15675278.html