pyspark 常用rdd函数例子
作者:互联网
## mapPartions def model_pred(partitionData): updatedData = [] for row in partitionData: pred_value = model.value.predict([row[2:]])[0] pred_value = float(round(pred_value,4)) updatedData.append([row[0],row[1],pred_value]) return iter(updatedData) pred = df.rdd.mapPartitions(model_pred).toDF(['p_number','name',"score"])
model 需要广播
标签:pyspark,pred,value,partitionData,rdd,例子,updatedData,model,row 来源: https://www.cnblogs.com/cupleo/p/16255447.html