其他分享
首页 > 其他分享> > 从Hive中使用HQL语句创建DataFrame--常用方式

从Hive中使用HQL语句创建DataFrame--常用方式

作者:互联网

  1. 默认情况下SparkSession不支持读取Hive中的数据,也不支持操作HQL语法,
    如果要读取Hive中的数据,需要开启Hive的支持,
    构建sparkSession的时候调用一个函数enableHiveSupport()
    val sparkConf = new SparkConf().setMaster("local[2]").setAppName("demo")
    val session = SparkSession.builder().config(sparkConf).enableHiveSupport().getOrCreate()

  1. 引入spark连接操作hive以及MySQL驱动依赖
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>8.0.18</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.11</artifactId>
    <version>2.3.1</version>
</dependency>
  1. sparksql操作hive需要通过连接Hive的元数据操作,需要将hive配置了元数据库的配置文件hive-site.xml文件放到项目的resources路径下
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
	  <name>javax.jdo.option.ConnectionURL</name>
	  <value>jdbc:mysql://node1:3306/hive_metastore?serverTimezone=UTC&amp;createDatabaseIfNotExist=true</value>
	  <description>JDBC connect string for a JDBC metastore</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionDriverName</name>
	  <value>com.mysql.cj.jdbc.Driver</value>
	  <description>Driver class name for a JDBC metastore</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionUserName</name>
	  <value>root</value>
	  <description>username to use against metastore database</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionPassword</name>
	  <value>Jsq123456...</value>
	  <description>password to use against metastore database</description>
	</property>
	<property> 
	  <name>hive.metastore.warehouse.dir</name> 
	  <value>/user/hive/warehouse</value> 
	  <description>location of default database for the warehouse</description> 
 	</property>
	<property> 
	  <name>hive.cli.print.header</name> 
	  <value>true</value> 
 	</property> 
	<property> 
	  <name>hive.cli.print.current.db</name> 
	  <value>true</value> 
	</property>
	

	<property> 
  	  <name>hive.server2.authentication</name> 
 	  <value>NONE</value> 
	</property> 
	<property> 
	  <name>hive.server2.thrift.bind.host</name> 
	  <value>node1</value> 
 	</property> 
	<property> 
	  <name>hive.server2.thrift.port</name> 
	  <value>10000</value> 
	  <description>TCP port number to listen on, default 10000</description> 
	</property> 
	<property> 
	  <name>hive.server2.thrift.http.port</name> 
	  <value>10001</value> 
	</property> 
	<property> 
	  <name>hive.server2.thrift.client.user</name> 
	  <value>root</value> 
	  <description>Username to use against thrift client</description> 
	</property> 
	<property> 
	  <name>hive.server2.thrift.client.password</name> 
	  <value>root</value> 
	  <description>Password to use against thrift client</description> 
	</property>
</configuration>

标签:server2,--,hive,metastore,HQL,DataFrame,mysql,Hive,thrift
来源: https://www.cnblogs.com/jsqup/p/16630341.html