其他分享
首页 > 其他分享> > 实践广告精准投放的bz2数据转parquet文件场景案例

实践广告精准投放的bz2数据转parquet文件场景案例

作者:互联网

导读Parquet的用途

(1)Parquet就是基于Google的Dremel系统的数据模型和算法实现的,可以跳过不符合条件的数据,只读取需要的数据,降低IO数据量;

(2)压缩编码可以降低磁盘存储空间。由于同一列的数据类型是一样的,可以使用更高效的压缩编码(例如Run Length Encoding和Delta Encoding)进一步节约存储空间

(3)由于Parquet是基于Google的Dremel系统的数据模型和算法实现的,所以只读取需要的列,支持向量运算,能够获取更好的扫描性能

(4)如果说HDFS 是大数据时代分布式文件系统首选标准,那么parquet则是整个大数据时代文件存储格式实时首选标准

(5)极大的减少磁盘I/o,通常情况下能够减少75%的存储空间,由此可以极大的减少spark sql处理数据的时候的数据输入内容,尤其是在spark1.6x中有个下推过滤器在一些情况下可以极大的减少磁盘的IO和内存的占用,(下推过滤器)


(6)spark 1.6x parquet方式极大的提升了扫描的吞吐量,极大提高了数据的查找速度spark1.6和spark1.5x相比而言,提升了大约1倍的速度,在spark1.6X中,操作parquet时候cpu也进行了极大的优化,有效的降低了cpu


(7)采用parquet可以极大的优化spark的调度和执行。我们测试spark如果用parquet可以有效的减少stage的执行消耗,同时可以优化执行路径

需求日志说明

1sessionid: String, 会话标
2advertisersid: Int, 广告主id
3adorderid: Int, 广告id
4adcreativeid: Int, 广告创意id ( >= 200000 : dsp)
5adplatformproviderid: Int, 广告平台商id (>= 100000: rtb)
6sdkversion: String, sdk 版本号
7adplatformkey: String, 平台商key
8putinmodeltype: Int, 针对广告主的投放模式,1:展示量投放2:点击
9requestmode: Int, 数据请求方式(1:请求、2:展示、3:点击)
10adprice: Double, 广告价格
11requestdate: String, 请求时间,格式为:yyyy-m-dd hh:mm:ss
12ip: String, 设备用户的真实ip 地址
13appid: String, 应用id
14appname: String, 应用名称
15uuid: String, 设备唯一标识
16        device: String, 设备型号,如htc、iphone
17client: Int, 操作系统(1:android 2:ios 3:wp)
18osversion: String, 设备操作系统版本
19density: String, 设备屏幕的密度
20pw: Int, 设备屏幕宽度
21ph: Int, 设备屏幕高度
22long: String, 设备所在经度
23lat: String, 设备所在纬度
24provincename: String, 设备所在省份名称
25cityname: String, 设备所在城市名称
26ispid: Int, 运营商id
27ispname: String, 运营商名称
28networkmannerid: Int, 联网方式id
29networkmannername:String,联网方式名称
30iseffective: Int, 有效标识(有效指可以正常计费的)(0:无效1:
31isbilling: Int, 是否收费(0:未收费1:已收费)
32adspacetype: Int, 广告位类型(1:banner 2:插屏3:全屏)
33adspacetypename: String, 广告位类型名称(banner、插屏、全屏)
34devicetype: Int, 设备类型(1:手机2:平板)
35processnode: Int, 流程节点(1:请求量kpi 2:有效请求3:广告请
36apptype: Int, 应用类型id
37district: String, 设备所在县名称
38paymode: Int, 针对平台商的支付模式,1:展示量投放(CPM) 2:点击
39isbid: Int, 是否rtb
40bidprice: Double, rtb 竞价价格
41winprice: Double, rtb 竞价成功价格
42iswin: Int, 是否竞价成功
43cur: String, values:usd|rmb 等
44rate: Double, 汇率
45cnywinprice: Double, rtb 竞价成功转换成人民币的价格
46imei: String, imei
47mac: String, mac
48idfa: String, idfa
49openudid: String, openudid
50androidid: String, androidid
51rtbprovince: String, rtb 省
52rtbcity: String, rtb 市
53rtbdistrict: String, rtb 区
54rtbstreet: String, rtb 街道
55storeurl: String, app 的市场下载地址
56realip: String, 真实ip
57isqualityapp: Int, 优选标识
58bidfloor: Double, 底价
59aw: Int, 广告位的宽
60ah: Int, 广告位的高
61imeimd5: String, imei_md5
62macmd5: String, mac_md5
63idfamd5: String, idfa_md5
64openudidmd5: String, openudid_md5
65androididmd5: String, androidid_md5
66imeisha1: String, imei_sha1
67macsha1: String, mac_sha1
68idfasha1: String, idfa_sha1
69openudidsha1: String, openudid_sha1
70androididsha1: String, androidid_sha1
71uuidunknow: String, uuid_unknow tanx 密文
72userid: String, 平台用户id
73iptype: Int, 表示ip 类型
74initbidprice: Double, 初始出价
75adpayment: Double, 转换后的广告消费
76agentrate: Double, 代理商利润率
77lrate: Double, 代理利润率
78adxrate: Double, 媒介利润率
79title: String, 标题
80keywords: String, 关键字
81tagid: String, 广告位标识(当视频流量时值为视频ID 号)
82callbackdate: String, 回调时间格式为:YYYY/mm/dd hh:mm:ss
83channelid: String, 频道ID
84mediatype: Int 媒体类型:1 长尾媒体2 视频媒体3 独立媒体默认:1

日志格式转换

给定的日志文件格式为bz2文件格式,这是一个压缩文件,为了后续统计方便,我们需要将bz2文件进行格式转换,将bz2文件转换成parquet文件

为什么要将bz2文件转成parquet文件?

因为parquet文件是一个列式存储文件格式,优点:

① 可以针对不同的列采用适合的压缩算法,进一步降低磁盘空间;

② 可以跳过不需要读取的列,降低了磁盘IO的扫描,提升了IO的性能;

③ 兼容很多的大数据处理框架,hive、spark

直接代码

创建dolphin-doit01工程项目,代码结构如图所示:

POM文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.4.4</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>cn.sheep</groupId>
    <artifactId>dolphin-doit01</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>dolphin-doit01</name>
    <description>Demo project for Spring Boot</description>
    <properties>
        <java.version>1.8</java.version>
    </properties>
    <dependencies>
        <!--scala library-->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.10.6</version>
        </dependency>

        <!--spark cores-->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.10</artifactId>
            <version>1.6.3</version>
        </dependency>

        <!--spark sql-->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.10</artifactId>
            <version>1.6.3</version>
        </dependency>

        <!--mysql-->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.42</version>
        </dependency>

        <dependency>
            <groupId>io.netty</groupId>
            <artifactId>netty-all</artifactId>
            <version>4.1.17.Final</version>
        </dependency>


    </dependencies>

    <build>
        <plugins>
            <!--scala编译插件-->
            <plugin>
                <!-- see http://davidb.github.com/scala-maven-plugin -->
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.1.3</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-make:transitive</arg>
                                <arg>-dependencyfile</arg>
                                <arg>${project.build.directory}/.scala_dependencies</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
<!--                <version>2.13</version>-->
                <configuration>
                    <useFile>false</useFile>
                    <disableXmlReport>true</disableXmlReport>
                    <!-- If you have classpath issue like NoDefClassError,... -->
                    <!-- useManifestOnlyJar>false</useManifestOnlyJar -->
                    <includes>
                        <include>**/*Test.*</include>
                        <include>**/*Suite.*</include>
                    </includes>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

dolphin-doit01\src\main\scala\cn\sheep\dolphin\etl\Bz2Parquet.scala


package cn.sheep.dolphin.etl

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types._
import org.apache.spark.sql.{Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}

/** 将bz2日志文件转换成parquet文件
 * author: old sheep
 * Created 2021/03/20  21:40
 */
object Bz2Parquet {
  def main(args: Array[String]): Unit = {
    // 检验参数
    if (args.length != 2) {
      println(
        """
          |Usage: cn.sheep.dolphin.etl.Bz2Parquet
          |Param:
          |  bz2InputPath  bz2日志文件的输入路径
          | parquetOutPath  parquet文件的输出路径
        """.stripMargin)
      sys.exit(-1) // -1 非正常退出
    }

    // 接收参数(模式匹配了)
    val Array(bz2InputPath, parquetOutPath) = args

    val conf = new SparkConf()
      .setAppName("将bz2日志文件转换成parquet文件")
      .setMaster("local[*]")

    //读取离线的数据文件的
    val sc = new SparkContext(conf)

    // 读取离线的bz2日志文件
    val data = sc.textFile(bz2InputPath)

    // 过滤非法数据
    val filteredRDD: RDD[Array[String]] = data.map(_.split(",", -1)).filter(_.size >= 85)

    // parquet <- DataFrame(几种创建方式) <- SQLContext <- RDD
    val sqlc = new SQLContext(sc)


    //val sc = DolphinAppComm.createSparkContext("将bz2日志文件转换成parquet文件")


    // 导入隐式转换(把钥匙给他)
    import cn.sheep.dolphin.bean.RichString._

    // RDD[Row] <- RDD[Array[String]]
    val rowRDD = filteredRDD.map(arr => Row(
      arr(0),
      arr(1).toIntPlus,
      arr(2).toIntPlus,
      arr(3).toIntPlus,
      arr(4).toIntPlus,
      arr(5),
      arr(6),
      arr(7).toIntPlus,
      arr(8).toIntPlus,
      arr(9).toDoublePlus,
      arr(10).toDoublePlus,
      arr(11),
      arr(12),
      arr(13),
      arr(14),
      arr(15),
      arr(16),
      arr(17).toIntPlus,
      arr(18),
      arr(19),
      arr(20).toIntPlus,
      arr(21).toIntPlus,
      arr(22),
      arr(23),
      arr(24),
      arr(25),
      arr(26).toIntPlus,
      arr(27),
      arr(28).toIntPlus,
      arr(29),
      arr(30).toIntPlus,
      arr(31).toIntPlus,
      arr(32).toIntPlus,
      arr(33),
      arr(34).toIntPlus,
      arr(35).toIntPlus,
      arr(36).toIntPlus,
      arr(37),
      arr(38).toIntPlus,
      arr(39).toIntPlus,
      arr(40).toDoublePlus,
      arr(41).toDoublePlus,
      arr(42).toIntPlus,
      arr(43),
      arr(44).toDoublePlus,
      arr(45).toDoublePlus,
      arr(46),
      arr(47),
      arr(48),
      arr(49),
      arr(50),
      arr(51),
      arr(52),
      arr(53),
      arr(54),
      arr(55),
      arr(56),
      arr(57).toIntPlus,
      arr(58).toDoublePlus,
      arr(59).toIntPlus,
      arr(60).toIntPlus,
      arr(61),
      arr(62),
      arr(63),
      arr(64),
      arr(65),
      arr(66),
      arr(67),
      arr(68),
      arr(69),
      arr(70),
      arr(71),
      arr(72),
      arr(73).toIntPlus,
      arr(74).toDoublePlus,
      arr(75).toDoublePlus,
      arr(76).toDoublePlus,
      arr(77).toDoublePlus,
      arr(78).toDoublePlus,
      arr(79),
      arr(80),
      arr(81),
      arr(82),
      arr(83),
      arr(84).toIntPlus
    ))

    // schema: StructType <- demo
    val schema = StructType(Seq(
      StructField("sessionid", StringType),
      StructField("advertisersid", IntegerType),
      StructField("adorderid", IntegerType),
      StructField("adcreativeid", IntegerType),
      StructField("adplatformproviderid", IntegerType),
      StructField("sdkversion", StringType),
      StructField("adplatformkey", StringType),
      StructField("putinmodeltype", IntegerType),
      StructField("requestmode", IntegerType),
      StructField("adprice", DoubleType),
      StructField("adppprice", DoubleType),
      StructField("requestdate", StringType),
      StructField("ip", StringType),
      StructField("appid", StringType),
      StructField("appname", StringType),
      StructField("uuid", StringType),
      StructField("device", StringType),
      StructField("client", IntegerType),
      StructField("osversion", StringType),
      StructField("density", StringType),
      StructField("pw", IntegerType),
      StructField("ph", IntegerType),
      StructField("long", StringType),
      StructField("lat", StringType),
      StructField("provincename", StringType),
      StructField("cityname", StringType),
      StructField("ispid", IntegerType),
      StructField("ispname", StringType),
      StructField("networkmannerid", IntegerType),
      StructField("networkmannername",StringType),
      StructField("iseffective", IntegerType),
      StructField("isbilling", IntegerType),
      StructField("adspacetype", IntegerType),
      StructField("adspacetypename", StringType),
      StructField("devicetype", IntegerType),
      StructField("processnode", IntegerType),
      StructField("apptype", IntegerType),
      StructField("district", StringType),
      StructField("paymode", IntegerType),
      StructField("isbid", IntegerType),
      StructField("bidprice", DoubleType),
      StructField("winprice", DoubleType),
      StructField("iswin", IntegerType),
      StructField("cur", StringType),
      StructField("rate", DoubleType),
      StructField("cnywinprice", DoubleType),
      StructField("imei", StringType),
      StructField("mac", StringType),
      StructField("idfa", StringType),
      StructField("openudid", StringType),
      StructField("androidid", StringType),
      StructField("rtbprovince", StringType),
      StructField("rtbcity", StringType),
      StructField("rtbdistrict", StringType),
      StructField("rtbstreet", StringType),
      StructField("storeurl", StringType),
      StructField("realip", StringType),
      StructField("isqualityapp", IntegerType),
      StructField("bidfloor", DoubleType),
      StructField("aw", IntegerType),
      StructField("ah", IntegerType),
      StructField("imeimd5", StringType),
      StructField("macmd5", StringType),
      StructField("idfamd5", StringType),
      StructField("openudidmd5", StringType),
      StructField("androididmd5", StringType),
      StructField("imeisha1", StringType),
      StructField("macsha1", StringType),
      StructField("idfasha1", StringType),
      StructField("openudidsha1", StringType),
      StructField("androididsha1", StringType),
      StructField("uuidunknow", StringType),
      StructField("userid", StringType),
      StructField("iptype", IntegerType),
      StructField("initbidprice", DoubleType),
      StructField("adpayment", DoubleType),
      StructField("agentrate", DoubleType),
      StructField("lrate", DoubleType),
      StructField("adxrate", DoubleType),
      StructField("title", StringType),
      StructField("keywords", StringType),
      StructField("tagid", StringType),
      StructField("callbackdate", StringType),
      StructField("channelid", StringType),
      StructField("mediatype", IntegerType)
    ))


    /**
     * RDD[Row] <- RDD[Array[String]]
     * schema: StructType <- demo
     */
    val dataFrame = sqlc.createDataFrame(rowRDD, schema)

    // dataFrame -> parquet
    // parquet输出的时候默认采用的gz压缩格式
    dataFrame.write.parquet(parquetOutPath)

    sc.stop()
  }
}

dolphin-doit01\src\main\scala\cn\sheep\dolphin\utils\NBFormat.scala

 

package cn.sheep.dolphin.utils

import org.apache.commons.lang.StringUtils
/** 字符串(数字)的格式化操作
 * author: old sheep
 * Created 2021/3/21  11:46
 */
object NBFormat {


  def apply(str: String) = {
    try {
      if (StringUtils.isNotEmpty(str)) {
        str.trim.toInt
      } else 0
    } catch {
      case _: Exception => 0
    }
  }

}

dolphin-doit01\src\main\scala\cn\sheep\dolphin\bean\RichString.scala

package cn.sheep.dolphin.bean

/**
 * author: old sheep
 * Created 2021/03/21
 */
class RichString(val str: String) {

  def toIntPlus = try {
    str.toInt
  } catch {
    case _: Exception => 0
  }

  def toDoublePlus = try {
    str.toDouble
  } catch {
    case _: Exception => 0d
  }
}

object RichString {
  /**
   * 将string 隐式转换成richString
   * @param str
   * @return
   */
  implicit def str2RichString(str: String) = new RichString(str)
}

配置参数输出参数

运行Bz2Parquet程序,控制台打印输出

{
  "type" : "struct",
  "fields" : [ {
    "name" : "sessionid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "advertisersid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adorderid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adcreativeid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adplatformproviderid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "sdkversion",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adplatformkey",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "putinmodeltype",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "requestmode",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adprice",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adppprice",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "requestdate",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "ip",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "appid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "appname",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "uuid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "device",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "client",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "osversion",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "density",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "pw",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "ph",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "long",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "lat",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "provincename",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "cityname",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "ispid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "ispname",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "networkmannerid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "networkmannername",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "iseffective",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "isbilling",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adspacetype",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adspacetypename",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "devicetype",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "processnode",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "apptype",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "district",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "paymode",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "isbid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "bidprice",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "winprice",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "iswin",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "cur",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "rate",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "cnywinprice",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "imei",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "mac",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "idfa",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "openudid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "androidid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "rtbprovince",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "rtbcity",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "rtbdistrict",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "rtbstreet",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "storeurl",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "realip",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "isqualityapp",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "bidfloor",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "aw",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "ah",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "imeimd5",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "macmd5",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "idfamd5",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "openudidmd5",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "androididmd5",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "imeisha1",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "macsha1",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "idfasha1",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "openudidsha1",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "androididsha1",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "uuidunknow",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "userid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "iptype",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "initbidprice",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adpayment",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "agentrate",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "lrate",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adxrate",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "title",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "keywords",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "tagid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "callbackdate",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "channelid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "mediatype",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  } ]
}

在输出文件路径查看

 

标签:arr,bz2,nullable,true,name,parquet,type,精准,metadata
来源: https://blog.csdn.net/weixin_39868387/article/details/118270997