Spark常用的算子总结——Map
作者:互联网
从一个list变成 key value
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", " eagle"), 2) val b = a.map(x => (x, 1)) b.collect.foreach(println(_)) # /* # (dog,1) # (tiger,1) # (lion,1) # (cat,1) # (panther,1) # ( eagle,1) # */
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", " eagle"), 2) val b = a.map(x => (x.length, x)) b.mapValues("x" + _ + "x").collect # //结果 # Array( # (3,xdogx), # (5,xtigerx), # (4,xlionx), # (3,xcatx), # (7,xpantherx), # (5,xeaglex) # )
自定义函数生成新的rdd
val a = sc.parallelize(1 to 9, 3) val b = a.map(x => x*2) b.collect # 结果Array[Int] = Array(2, 4, 6, 8, 10, 12, 14, 16, 18)
就是把key value变成另一个key value
val l=sc.parallelize(List((1,'a'),(2,'b'))) var ll=l.map(x=>(x._1,"PV:"+x._2)).collect() ll.foreach(println) # (1,PVa) # (2,PVb)
标签:Map,val,parallelize,map,collect,lion,算子,sc,Spark 来源: https://www.cnblogs.com/pocahontas/p/11334497.html