其他分享
首页 > 其他分享> > DataFrame中的行动算子操作1

DataFrame中的行动算子操作1

作者:互联网

val conf = new SparkConf().setAppName("action").setMaster("local[*]")
val session = SparkSession.builder().config(conf).getOrCreate()

val seq: Seq[(String, Int)] = Array(
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 21),
  ("zs123456789123456789123", 22),
  ("zs123456789123456789123", 23),
  ("zs123456789123456789123", 24),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 21),
  ("zs123456789123456789123", 22),
  ("zs123456789123456789123", 23),
  ("zs123456789123456789123", 24),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 29),
  ("zs123456789123456789123", 30),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 29),
  ("zs123456789123456789123", 30)
)
import session.implicits._
val frame: DataFrame = seq.toDF("namea", "ageb")

1. printSchema

def printSchemaOpt(frame: DataFrame): Unit = {
  println("-----------printschema操作开始-----------")
  frame.printSchema()
  println("-----------printschema操作结束-----------")
}
结果:
-----------printschema操作开始-----------
root
 |-- namea: string (nullable = true)
 |-- ageb: integer (nullable = false)

-----------printschema操作结束-----------

2. show

show():显示所有数据,最多显示20个字符,默认为true
show(n) :显示前n条数据,最多显示20个字符,默认为true
show(true): 最多显示20个字符,默认为true
show(false): 去除最多显示20个字符的限制
show(n, true):显示前n条并最多显示20个字符

def showOpt(frame: DataFrame) = {
  println("-----------show1操作开始-----------")
  frame.show()
  println("-----------show1操作结束-----------")
  println("-----------show2操作开始-----------")
  frame.show(3)
  println("-----------show2操作结束-----------")
  println("-----------show3操作开始-----------")
  frame.show(30, true)
  println("-----------show3操作结束-----------")
}
-----------show1操作开始-----------
+--------------------+----+
|               namea|ageb|
+--------------------+----+
|zs123456789123456...|  20|
|zs123456789123456...|  21|
|zs123456789123456...|  22|
|zs123456789123456...|  23|
|zs123456789123456...|  24|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  21|
|zs123456789123456...|  22|
|zs123456789123456...|  23|
|zs123456789123456...|  24|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
+--------------------+----+
only showing top 20 rows
-----------show1操作结束-----------
-----------show2操作开始-----------
+--------------------+----+
|               namea|ageb|
+--------------------+----+
|zs123456789123456...|  20|
|zs123456789123456...|  21|
|zs123456789123456...|  22|
+--------------------+----+
only showing top 3 rows

-----------show2操作结束-----------
-----------show3操作开始-----------
+--------------------+----+
|               namea|ageb|
+--------------------+----+
|zs123456789123456...|  20|
|zs123456789123456...|  21|
|zs123456789123456...|  22|
|zs123456789123456...|  23|
|zs123456789123456...|  24|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  21|
|zs123456789123456...|  22|
|zs123456789123456...|  23|
|zs123456789123456...|  24|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  29|
|zs123456789123456...|  30|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
+--------------------+----+
only showing top 30 rows

-----------show3操作结束-----------

3. first/head/take/takeAsList

def getDataOpt(frame: DataFrame): Unit = {
  println("-----------first操作开始-----------")
  val row: Row = frame.first()
  println(row.getAs[Int](1))
  println("-----------first操作结束-----------")
  println("-----------head操作开始-----------")
  val array: Array[Row] = frame.head(3)
  println(array.mkString("="))
  println("-----------head操作结束-----------")
  println("-----------take操作开始-----------")
  val arr: Array[Row] = frame.take(3)
  println(arr.mkString("="))
  println("-----------take操作结束-----------")
  println("-----------takeAsList操作开始-----------")
  val list: util.List[Row] = frame.takeAsList(3)
  println(list)
  println("-----------takeAsList操作结束-----------")
}
-----------first操作开始-----------
20
-----------first操作结束-----------
-----------head操作开始-----------
[zs123456789123456789123,20]=[zs123456789123456789123,21]=[zs123456789123456789123,22]
-----------head操作结束-----------
-----------take操作开始-----------
[zs123456789123456789123,20]=[zs123456789123456789123,21]=[zs123456789123456789123,22]
-----------take操作结束-----------
-----------takeAsList操作开始-----------
[[zs123456789123456789123,20], [zs123456789123456789123,21], [zs123456789123456789123,22]]
-----------takeAsList操作结束-----------

4. collect/collectAsList:慎用:获取DataFrame中的所有数据,将DataFrame在不同分区的数据拉取到同一个节点上,容易导致内存溢出

def collectOpt(frame: DataFrame): Unit = {
  println("-----------collect操作结束-----------")
  val array: Array[Row] = frame.collect()
  println(array.mkString("="))
  println("-----------collect操作结束-----------")
  println("-----------collectAsList操作开始-----------")
  val array1 = frame.collectAsList()
  println(array1)
  println("-----------collectAsList操作结束-----------")
}
-----------collect操作结束-----------
[zs123456789123456789123,20]=[zs123456789123456789123,21]=[zs123456789123456789123,22]=[zs123456789123456789123,23]=[zs123456789123456789123,24]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,21]=[zs123456789123456789123,22]=[zs123456789123456789123,23]=[zs123456789123456789123,24]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,29]=[zs123456789123456789123,30]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,29]=[zs123456789123456789123,30]
-----------collect操作结束-----------
-----------collectAsList操作开始-----------
[[zs123456789123456789123,20], [zs123456789123456789123,21], [zs123456789123456789123,22], [zs123456789123456789123,23], [zs123456789123456789123,24], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,21], [zs123456789123456789123,22], [zs123456789123456789123,23], [zs123456789123456789123,24], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,29], [zs123456789123456789123,30], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,29], [zs123456789123456789123,30]]
-----------collectAsList操作结束-----------

标签:...,20,zs123456789123456789123,行动,-----------,DataFrame,zs123456789123456,算子,pri
来源: https://www.cnblogs.com/jsqup/p/16638826.html