DataFrame中的行动算子操作1
作者:互联网
val conf = new SparkConf().setAppName("action").setMaster("local[*]")
val session = SparkSession.builder().config(conf).getOrCreate()
val seq: Seq[(String, Int)] = Array(
("zs123456789123456789123", 20),
("zs123456789123456789123", 21),
("zs123456789123456789123", 22),
("zs123456789123456789123", 23),
("zs123456789123456789123", 24),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 21),
("zs123456789123456789123", 22),
("zs123456789123456789123", 23),
("zs123456789123456789123", 24),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 29),
("zs123456789123456789123", 30),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 20),
("zs123456789123456789123", 29),
("zs123456789123456789123", 30)
)
import session.implicits._
val frame: DataFrame = seq.toDF("namea", "ageb")
1. printSchema
def printSchemaOpt(frame: DataFrame): Unit = {
println("-----------printschema操作开始-----------")
frame.printSchema()
println("-----------printschema操作结束-----------")
}
结果:
-----------printschema操作开始-----------
root
|-- namea: string (nullable = true)
|-- ageb: integer (nullable = false)
-----------printschema操作结束-----------
2. show
show():显示所有数据,最多显示20个字符,默认为true
show(n) :显示前n条数据,最多显示20个字符,默认为true
show(true): 最多显示20个字符,默认为true
show(false): 去除最多显示20个字符的限制
show(n, true):显示前n条并最多显示20个字符
def showOpt(frame: DataFrame) = {
println("-----------show1操作开始-----------")
frame.show()
println("-----------show1操作结束-----------")
println("-----------show2操作开始-----------")
frame.show(3)
println("-----------show2操作结束-----------")
println("-----------show3操作开始-----------")
frame.show(30, true)
println("-----------show3操作结束-----------")
}
-----------show1操作开始-----------
+--------------------+----+
| namea|ageb|
+--------------------+----+
|zs123456789123456...| 20|
|zs123456789123456...| 21|
|zs123456789123456...| 22|
|zs123456789123456...| 23|
|zs123456789123456...| 24|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 21|
|zs123456789123456...| 22|
|zs123456789123456...| 23|
|zs123456789123456...| 24|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
+--------------------+----+
only showing top 20 rows
-----------show1操作结束-----------
-----------show2操作开始-----------
+--------------------+----+
| namea|ageb|
+--------------------+----+
|zs123456789123456...| 20|
|zs123456789123456...| 21|
|zs123456789123456...| 22|
+--------------------+----+
only showing top 3 rows
-----------show2操作结束-----------
-----------show3操作开始-----------
+--------------------+----+
| namea|ageb|
+--------------------+----+
|zs123456789123456...| 20|
|zs123456789123456...| 21|
|zs123456789123456...| 22|
|zs123456789123456...| 23|
|zs123456789123456...| 24|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 21|
|zs123456789123456...| 22|
|zs123456789123456...| 23|
|zs123456789123456...| 24|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
|zs123456789123456...| 29|
|zs123456789123456...| 30|
|zs123456789123456...| 20|
|zs123456789123456...| 20|
+--------------------+----+
only showing top 30 rows
-----------show3操作结束-----------
3. first/head/take/takeAsList
def getDataOpt(frame: DataFrame): Unit = {
println("-----------first操作开始-----------")
val row: Row = frame.first()
println(row.getAs[Int](1))
println("-----------first操作结束-----------")
println("-----------head操作开始-----------")
val array: Array[Row] = frame.head(3)
println(array.mkString("="))
println("-----------head操作结束-----------")
println("-----------take操作开始-----------")
val arr: Array[Row] = frame.take(3)
println(arr.mkString("="))
println("-----------take操作结束-----------")
println("-----------takeAsList操作开始-----------")
val list: util.List[Row] = frame.takeAsList(3)
println(list)
println("-----------takeAsList操作结束-----------")
}
-----------first操作开始-----------
20
-----------first操作结束-----------
-----------head操作开始-----------
[zs123456789123456789123,20]=[zs123456789123456789123,21]=[zs123456789123456789123,22]
-----------head操作结束-----------
-----------take操作开始-----------
[zs123456789123456789123,20]=[zs123456789123456789123,21]=[zs123456789123456789123,22]
-----------take操作结束-----------
-----------takeAsList操作开始-----------
[[zs123456789123456789123,20], [zs123456789123456789123,21], [zs123456789123456789123,22]]
-----------takeAsList操作结束-----------
4. collect/collectAsList:慎用:获取DataFrame中的所有数据,将DataFrame在不同分区的数据拉取到同一个节点上,容易导致内存溢出
def collectOpt(frame: DataFrame): Unit = {
println("-----------collect操作结束-----------")
val array: Array[Row] = frame.collect()
println(array.mkString("="))
println("-----------collect操作结束-----------")
println("-----------collectAsList操作开始-----------")
val array1 = frame.collectAsList()
println(array1)
println("-----------collectAsList操作结束-----------")
}
-----------collect操作结束-----------
[zs123456789123456789123,20]=[zs123456789123456789123,21]=[zs123456789123456789123,22]=[zs123456789123456789123,23]=[zs123456789123456789123,24]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,21]=[zs123456789123456789123,22]=[zs123456789123456789123,23]=[zs123456789123456789123,24]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,29]=[zs123456789123456789123,30]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,29]=[zs123456789123456789123,30]
-----------collect操作结束-----------
-----------collectAsList操作开始-----------
[[zs123456789123456789123,20], [zs123456789123456789123,21], [zs123456789123456789123,22], [zs123456789123456789123,23], [zs123456789123456789123,24], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,21], [zs123456789123456789123,22], [zs123456789123456789123,23], [zs123456789123456789123,24], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,29], [zs123456789123456789123,30], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,29], [zs123456789123456789123,30]]
-----------collectAsList操作结束-----------
标签:...,20,zs123456789123456789123,行动,-----------,DataFrame,zs123456789123456,算子,pri 来源: https://www.cnblogs.com/jsqup/p/16638826.html