如何使用matplotlib绘制pyspark sql结果
作者:互联网
我是pyspark的新手.我想使用matplotlib绘制结果,但不确定使用哪个函数.我搜索了一种将sql结果转换为pandas然后使用plot的方法.
解决方法:
嗨团队我找到了解决方案.我将sql数据帧转换为pandas数据帧,然后我能够绘制图形.下面是示例代码.from
pyspark.sql import Row
from pyspark.sql import HiveContext
import pyspark
from IPython.display import display
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
sc = pyspark.SparkContext()
sqlContext = HiveContext(sc)
test_list = [(1, 'hasan'),(2, 'nana'),(3, 'dad'),(4, 'mon')]
rdd = sc.parallelize(test_list)
people = rdd.map(lambda x: Row(id=int(x[0]), name=x[1]))
schemaPeople = sqlContext.createDataFrame(people)
# Register it as a temp table
sqlContext.registerDataFrameAsTable(schemaPeople, "test_table")
df1=sqlContext.sql("Select * from test_table")
pdf1=df1.toPandas()
pdf1.plot(kind='barh',x='name',y='id',colormap='winter_r')
标签:pyspark-sql,python,pandas,matplotlib 来源: https://codeday.me/bug/20191006/1858160.html