Spark-UDF
作者:互联网
User Defined Functions - Python
This document contains an examples of creating a UDF in Python and registering it for use in Spark SQL.
What is Pyspark UDF
- Pyspark UDF is a user defined function executed in Python runtime.
- Two types
- Row UDF
- lambda x: x+1
- lambda data1, data2: (data1 - data2).year
- Group UDF :
- lambda values: np.mean(np.array(values))
So the row UDF, it’s similar to what you do in Spark with the map operator and pressing a function.
Group UDF, while it’s similar to a row UDF, except you want to do it on a list of values. So it’s similar to do a groupBy followed by a map operator in Scala.
Define a function , register the function as a UDF and set the return type of your UDF.
1 | def (s): |
Call the UDF in Spark SQL
1 | sqlContext.range(1, 20).registerTempTable("test") |
Use UDF with DataFrames
1 | %sql select * FROM test |
User-Defined Aggregate Functions - Scala
1 |
标签:function,squared,UDF,sql,Spark,id 来源: https://www.cnblogs.com/liuzhongrong/p/12249251.html