实践广告精准投放的bz2数据转parquet文件场景案例
作者:互联网
导读Parquet的用途
(1)Parquet就是基于Google的Dremel系统的数据模型和算法实现的,可以跳过不符合条件的数据,只读取需要的数据,降低IO数据量;
(2)压缩编码可以降低磁盘存储空间。由于同一列的数据类型是一样的,可以使用更高效的压缩编码(例如Run Length Encoding和Delta Encoding)进一步节约存储空间
(3)由于Parquet是基于Google的Dremel系统的数据模型和算法实现的,所以只读取需要的列,支持向量运算,能够获取更好的扫描性能
(4)如果说HDFS 是大数据时代分布式文件系统首选标准,那么parquet则是整个大数据时代文件存储格式实时首选标准
(5)极大的减少磁盘I/o,通常情况下能够减少75%的存储空间,由此可以极大的减少spark sql处理数据的时候的数据输入内容,尤其是在spark1.6x中有个下推过滤器在一些情况下可以极大的减少磁盘的IO和内存的占用,(下推过滤器)
(6)spark 1.6x parquet方式极大的提升了扫描的吞吐量,极大提高了数据的查找速度spark1.6和spark1.5x相比而言,提升了大约1倍的速度,在spark1.6X中,操作parquet时候cpu也进行了极大的优化,有效的降低了cpu
(7)采用parquet可以极大的优化spark的调度和执行。我们测试spark如果用parquet可以有效的减少stage的执行消耗,同时可以优化执行路径
需求日志说明
1 | sessionid: String, 会话标 |
2 | advertisersid: Int, 广告主id |
3 | adorderid: Int, 广告id |
4 | adcreativeid: Int, 广告创意id ( >= 200000 : dsp) |
5 | adplatformproviderid: Int, 广告平台商id (>= 100000: rtb) |
6 | sdkversion: String, sdk 版本号 |
7 | adplatformkey: String, 平台商key |
8 | putinmodeltype: Int, 针对广告主的投放模式,1:展示量投放2:点击 |
9 | requestmode: Int, 数据请求方式(1:请求、2:展示、3:点击) |
10 | adprice: Double, 广告价格 |
11 | requestdate: String, 请求时间,格式为:yyyy-m-dd hh:mm:ss |
12 | ip: String, 设备用户的真实ip 地址 |
13 | appid: String, 应用id |
14 | appname: String, 应用名称 |
15 | uuid: String, 设备唯一标识 |
16 | device: String, 设备型号,如htc、iphone |
17 | client: Int, 操作系统(1:android 2:ios 3:wp) |
18 | osversion: String, 设备操作系统版本 |
19 | density: String, 设备屏幕的密度 |
20 | pw: Int, 设备屏幕宽度 |
21 | ph: Int, 设备屏幕高度 |
22 | long: String, 设备所在经度 |
23 | lat: String, 设备所在纬度 |
24 | provincename: String, 设备所在省份名称 |
25 | cityname: String, 设备所在城市名称 |
26 | ispid: Int, 运营商id |
27 | ispname: String, 运营商名称 |
28 | networkmannerid: Int, 联网方式id |
29 | networkmannername:String,联网方式名称 |
30 | iseffective: Int, 有效标识(有效指可以正常计费的)(0:无效1: |
31 | isbilling: Int, 是否收费(0:未收费1:已收费) |
32 | adspacetype: Int, 广告位类型(1:banner 2:插屏3:全屏) |
33 | adspacetypename: String, 广告位类型名称(banner、插屏、全屏) |
34 | devicetype: Int, 设备类型(1:手机2:平板) |
35 | processnode: Int, 流程节点(1:请求量kpi 2:有效请求3:广告请 |
36 | apptype: Int, 应用类型id |
37 | district: String, 设备所在县名称 |
38 | paymode: Int, 针对平台商的支付模式,1:展示量投放(CPM) 2:点击 |
39 | isbid: Int, 是否rtb |
40 | bidprice: Double, rtb 竞价价格 |
41 | winprice: Double, rtb 竞价成功价格 |
42 | iswin: Int, 是否竞价成功 |
43 | cur: String, values:usd|rmb 等 |
44 | rate: Double, 汇率 |
45 | cnywinprice: Double, rtb 竞价成功转换成人民币的价格 |
46 | imei: String, imei |
47 | mac: String, mac |
48 | idfa: String, idfa |
49 | openudid: String, openudid |
50 | androidid: String, androidid |
51 | rtbprovince: String, rtb 省 |
52 | rtbcity: String, rtb 市 |
53 | rtbdistrict: String, rtb 区 |
54 | rtbstreet: String, rtb 街道 |
55 | storeurl: String, app 的市场下载地址 |
56 | realip: String, 真实ip |
57 | isqualityapp: Int, 优选标识 |
58 | bidfloor: Double, 底价 |
59 | aw: Int, 广告位的宽 |
60 | ah: Int, 广告位的高 |
61 | imeimd5: String, imei_md5 |
62 | macmd5: String, mac_md5 |
63 | idfamd5: String, idfa_md5 |
64 | openudidmd5: String, openudid_md5 |
65 | androididmd5: String, androidid_md5 |
66 | imeisha1: String, imei_sha1 |
67 | macsha1: String, mac_sha1 |
68 | idfasha1: String, idfa_sha1 |
69 | openudidsha1: String, openudid_sha1 |
70 | androididsha1: String, androidid_sha1 |
71 | uuidunknow: String, uuid_unknow tanx 密文 |
72 | userid: String, 平台用户id |
73 | iptype: Int, 表示ip 类型 |
74 | initbidprice: Double, 初始出价 |
75 | adpayment: Double, 转换后的广告消费 |
76 | agentrate: Double, 代理商利润率 |
77 | lrate: Double, 代理利润率 |
78 | adxrate: Double, 媒介利润率 |
79 | title: String, 标题 |
80 | keywords: String, 关键字 |
81 | tagid: String, 广告位标识(当视频流量时值为视频ID 号) |
82 | callbackdate: String, 回调时间格式为:YYYY/mm/dd hh:mm:ss |
83 | channelid: String, 频道ID |
84 | mediatype: Int 媒体类型:1 长尾媒体2 视频媒体3 独立媒体默认:1 |
日志格式转换
给定的日志文件格式为bz2文件格式,这是一个压缩文件,为了后续统计方便,我们需要将bz2文件进行格式转换,将bz2文件转换成parquet文件
为什么要将bz2文件转成parquet文件?
因为parquet文件是一个列式存储文件格式,优点:
① 可以针对不同的列采用适合的压缩算法,进一步降低磁盘空间;
② 可以跳过不需要读取的列,降低了磁盘IO的扫描,提升了IO的性能;
③ 兼容很多的大数据处理框架,hive、spark
直接代码
创建dolphin-doit01工程项目,代码结构如图所示:
POM文件
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.4.4</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>cn.sheep</groupId>
<artifactId>dolphin-doit01</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>dolphin-doit01</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>
</properties>
<dependencies>
<!--scala library-->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.10.6</version>
</dependency>
<!--spark cores-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.3</version>
</dependency>
<!--spark sql-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.3</version>
</dependency>
<!--mysql-->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.42</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.17.Final</version>
</dependency>
</dependencies>
<build>
<plugins>
<!--scala编译插件-->
<plugin>
<!-- see http://davidb.github.com/scala-maven-plugin -->
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.1.3</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<arg>-make:transitive</arg>
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
</args>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<!-- <version>2.13</version>-->
<configuration>
<useFile>false</useFile>
<disableXmlReport>true</disableXmlReport>
<!-- If you have classpath issue like NoDefClassError,... -->
<!-- useManifestOnlyJar>false</useManifestOnlyJar -->
<includes>
<include>**/*Test.*</include>
<include>**/*Suite.*</include>
</includes>
</configuration>
</plugin>
</plugins>
</build>
</project>
dolphin-doit01\src\main\scala\cn\sheep\dolphin\etl\Bz2Parquet.scala
package cn.sheep.dolphin.etl
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types._
import org.apache.spark.sql.{Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
/** 将bz2日志文件转换成parquet文件
* author: old sheep
* Created 2021/03/20 21:40
*/
object Bz2Parquet {
def main(args: Array[String]): Unit = {
// 检验参数
if (args.length != 2) {
println(
"""
|Usage: cn.sheep.dolphin.etl.Bz2Parquet
|Param:
| bz2InputPath bz2日志文件的输入路径
| parquetOutPath parquet文件的输出路径
""".stripMargin)
sys.exit(-1) // -1 非正常退出
}
// 接收参数(模式匹配了)
val Array(bz2InputPath, parquetOutPath) = args
val conf = new SparkConf()
.setAppName("将bz2日志文件转换成parquet文件")
.setMaster("local[*]")
//读取离线的数据文件的
val sc = new SparkContext(conf)
// 读取离线的bz2日志文件
val data = sc.textFile(bz2InputPath)
// 过滤非法数据
val filteredRDD: RDD[Array[String]] = data.map(_.split(",", -1)).filter(_.size >= 85)
// parquet <- DataFrame(几种创建方式) <- SQLContext <- RDD
val sqlc = new SQLContext(sc)
//val sc = DolphinAppComm.createSparkContext("将bz2日志文件转换成parquet文件")
// 导入隐式转换(把钥匙给他)
import cn.sheep.dolphin.bean.RichString._
// RDD[Row] <- RDD[Array[String]]
val rowRDD = filteredRDD.map(arr => Row(
arr(0),
arr(1).toIntPlus,
arr(2).toIntPlus,
arr(3).toIntPlus,
arr(4).toIntPlus,
arr(5),
arr(6),
arr(7).toIntPlus,
arr(8).toIntPlus,
arr(9).toDoublePlus,
arr(10).toDoublePlus,
arr(11),
arr(12),
arr(13),
arr(14),
arr(15),
arr(16),
arr(17).toIntPlus,
arr(18),
arr(19),
arr(20).toIntPlus,
arr(21).toIntPlus,
arr(22),
arr(23),
arr(24),
arr(25),
arr(26).toIntPlus,
arr(27),
arr(28).toIntPlus,
arr(29),
arr(30).toIntPlus,
arr(31).toIntPlus,
arr(32).toIntPlus,
arr(33),
arr(34).toIntPlus,
arr(35).toIntPlus,
arr(36).toIntPlus,
arr(37),
arr(38).toIntPlus,
arr(39).toIntPlus,
arr(40).toDoublePlus,
arr(41).toDoublePlus,
arr(42).toIntPlus,
arr(43),
arr(44).toDoublePlus,
arr(45).toDoublePlus,
arr(46),
arr(47),
arr(48),
arr(49),
arr(50),
arr(51),
arr(52),
arr(53),
arr(54),
arr(55),
arr(56),
arr(57).toIntPlus,
arr(58).toDoublePlus,
arr(59).toIntPlus,
arr(60).toIntPlus,
arr(61),
arr(62),
arr(63),
arr(64),
arr(65),
arr(66),
arr(67),
arr(68),
arr(69),
arr(70),
arr(71),
arr(72),
arr(73).toIntPlus,
arr(74).toDoublePlus,
arr(75).toDoublePlus,
arr(76).toDoublePlus,
arr(77).toDoublePlus,
arr(78).toDoublePlus,
arr(79),
arr(80),
arr(81),
arr(82),
arr(83),
arr(84).toIntPlus
))
// schema: StructType <- demo
val schema = StructType(Seq(
StructField("sessionid", StringType),
StructField("advertisersid", IntegerType),
StructField("adorderid", IntegerType),
StructField("adcreativeid", IntegerType),
StructField("adplatformproviderid", IntegerType),
StructField("sdkversion", StringType),
StructField("adplatformkey", StringType),
StructField("putinmodeltype", IntegerType),
StructField("requestmode", IntegerType),
StructField("adprice", DoubleType),
StructField("adppprice", DoubleType),
StructField("requestdate", StringType),
StructField("ip", StringType),
StructField("appid", StringType),
StructField("appname", StringType),
StructField("uuid", StringType),
StructField("device", StringType),
StructField("client", IntegerType),
StructField("osversion", StringType),
StructField("density", StringType),
StructField("pw", IntegerType),
StructField("ph", IntegerType),
StructField("long", StringType),
StructField("lat", StringType),
StructField("provincename", StringType),
StructField("cityname", StringType),
StructField("ispid", IntegerType),
StructField("ispname", StringType),
StructField("networkmannerid", IntegerType),
StructField("networkmannername",StringType),
StructField("iseffective", IntegerType),
StructField("isbilling", IntegerType),
StructField("adspacetype", IntegerType),
StructField("adspacetypename", StringType),
StructField("devicetype", IntegerType),
StructField("processnode", IntegerType),
StructField("apptype", IntegerType),
StructField("district", StringType),
StructField("paymode", IntegerType),
StructField("isbid", IntegerType),
StructField("bidprice", DoubleType),
StructField("winprice", DoubleType),
StructField("iswin", IntegerType),
StructField("cur", StringType),
StructField("rate", DoubleType),
StructField("cnywinprice", DoubleType),
StructField("imei", StringType),
StructField("mac", StringType),
StructField("idfa", StringType),
StructField("openudid", StringType),
StructField("androidid", StringType),
StructField("rtbprovince", StringType),
StructField("rtbcity", StringType),
StructField("rtbdistrict", StringType),
StructField("rtbstreet", StringType),
StructField("storeurl", StringType),
StructField("realip", StringType),
StructField("isqualityapp", IntegerType),
StructField("bidfloor", DoubleType),
StructField("aw", IntegerType),
StructField("ah", IntegerType),
StructField("imeimd5", StringType),
StructField("macmd5", StringType),
StructField("idfamd5", StringType),
StructField("openudidmd5", StringType),
StructField("androididmd5", StringType),
StructField("imeisha1", StringType),
StructField("macsha1", StringType),
StructField("idfasha1", StringType),
StructField("openudidsha1", StringType),
StructField("androididsha1", StringType),
StructField("uuidunknow", StringType),
StructField("userid", StringType),
StructField("iptype", IntegerType),
StructField("initbidprice", DoubleType),
StructField("adpayment", DoubleType),
StructField("agentrate", DoubleType),
StructField("lrate", DoubleType),
StructField("adxrate", DoubleType),
StructField("title", StringType),
StructField("keywords", StringType),
StructField("tagid", StringType),
StructField("callbackdate", StringType),
StructField("channelid", StringType),
StructField("mediatype", IntegerType)
))
/**
* RDD[Row] <- RDD[Array[String]]
* schema: StructType <- demo
*/
val dataFrame = sqlc.createDataFrame(rowRDD, schema)
// dataFrame -> parquet
// parquet输出的时候默认采用的gz压缩格式
dataFrame.write.parquet(parquetOutPath)
sc.stop()
}
}
dolphin-doit01\src\main\scala\cn\sheep\dolphin\utils\NBFormat.scala
package cn.sheep.dolphin.utils
import org.apache.commons.lang.StringUtils
/** 字符串(数字)的格式化操作
* author: old sheep
* Created 2021/3/21 11:46
*/
object NBFormat {
def apply(str: String) = {
try {
if (StringUtils.isNotEmpty(str)) {
str.trim.toInt
} else 0
} catch {
case _: Exception => 0
}
}
}
dolphin-doit01\src\main\scala\cn\sheep\dolphin\bean\RichString.scala
package cn.sheep.dolphin.bean
/**
* author: old sheep
* Created 2021/03/21
*/
class RichString(val str: String) {
def toIntPlus = try {
str.toInt
} catch {
case _: Exception => 0
}
def toDoublePlus = try {
str.toDouble
} catch {
case _: Exception => 0d
}
}
object RichString {
/**
* 将string 隐式转换成richString
* @param str
* @return
*/
implicit def str2RichString(str: String) = new RichString(str)
}
配置参数输出参数
运行Bz2Parquet程序,控制台打印输出
{
"type" : "struct",
"fields" : [ {
"name" : "sessionid",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "advertisersid",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "adorderid",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "adcreativeid",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "adplatformproviderid",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "sdkversion",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "adplatformkey",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "putinmodeltype",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "requestmode",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "adprice",
"type" : "double",
"nullable" : true,
"metadata" : { }
}, {
"name" : "adppprice",
"type" : "double",
"nullable" : true,
"metadata" : { }
}, {
"name" : "requestdate",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "ip",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "appid",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "appname",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "uuid",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "device",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "client",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "osversion",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "density",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "pw",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "ph",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "long",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "lat",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "provincename",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "cityname",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "ispid",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "ispname",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "networkmannerid",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "networkmannername",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "iseffective",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "isbilling",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "adspacetype",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "adspacetypename",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "devicetype",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "processnode",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "apptype",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "district",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "paymode",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "isbid",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "bidprice",
"type" : "double",
"nullable" : true,
"metadata" : { }
}, {
"name" : "winprice",
"type" : "double",
"nullable" : true,
"metadata" : { }
}, {
"name" : "iswin",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "cur",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "rate",
"type" : "double",
"nullable" : true,
"metadata" : { }
}, {
"name" : "cnywinprice",
"type" : "double",
"nullable" : true,
"metadata" : { }
}, {
"name" : "imei",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "mac",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "idfa",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "openudid",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "androidid",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "rtbprovince",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "rtbcity",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "rtbdistrict",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "rtbstreet",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "storeurl",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "realip",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "isqualityapp",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "bidfloor",
"type" : "double",
"nullable" : true,
"metadata" : { }
}, {
"name" : "aw",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "ah",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "imeimd5",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "macmd5",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "idfamd5",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "openudidmd5",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "androididmd5",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "imeisha1",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "macsha1",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "idfasha1",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "openudidsha1",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "androididsha1",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "uuidunknow",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "userid",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "iptype",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "initbidprice",
"type" : "double",
"nullable" : true,
"metadata" : { }
}, {
"name" : "adpayment",
"type" : "double",
"nullable" : true,
"metadata" : { }
}, {
"name" : "agentrate",
"type" : "double",
"nullable" : true,
"metadata" : { }
}, {
"name" : "lrate",
"type" : "double",
"nullable" : true,
"metadata" : { }
}, {
"name" : "adxrate",
"type" : "double",
"nullable" : true,
"metadata" : { }
}, {
"name" : "title",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "keywords",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "tagid",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "callbackdate",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "channelid",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "mediatype",
"type" : "integer",
"nullable" : true,
"metadata" : { }
} ]
}
在输出文件路径查看
标签:arr,bz2,nullable,true,name,parquet,type,精准,metadata 来源: https://blog.csdn.net/weixin_39868387/article/details/118270997