Spark排序之SortBy
作者:互联网
1、例子1:按照value进行降序排序
def sortBy[K]( f: (T) => K, ascending: Boolean = true, // 默认为正序排列,从小到大,false:倒序 numPartitions: Int = this.partitions.length) (implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T] 返回值是T,数字不会变
package com.test.spark import org.apache.spark.{SparkConf, SparkContext} /** * @author admin * SortBy是SortByKey的增强版 * 按照value进行排序 */ object SparkSortByApplication { def main(args : Array[String]) : Unit = { val conf = new SparkConf().setAppName( "SortSecond" ).setMaster( "local[1]" ) val sc = new SparkContext(conf) val datas = sc.parallelize(Array(( "cc" , 12 ),( "bb" , 32 ),( "cc" , 22 ),( "aa" , 18 ),( "bb" , 16 ),( "dd" , 16 ),( "ee" , 54 ),( "cc" , 1 ),( "ff" , 13 ),( "gg" , 32 ),( "bb" , 4 ))) // 统计key出现的次数 val counts = datas.reduceByKey( _ + _ ) // 按照value进行降序排序 val sorts = counts.sortBy( _ . _ 2 , false ) sorts.collect().foreach(println)
sc.stop() } }
输出结果: (ee,54) (bb,52) (cc,35) (gg,32) (aa,18) (dd,16) (ff,13)
2、例子2:先按照第一个元素升序排序,如果第一个元素相同,再进行第三个元素进行升序排序
package com.sudiyi.spark import org.apache.spark.{SparkConf, SparkContext} /** * @author xubiao * SortBy是SortByKey的增强版 * 先按照第一个,再按照第三个元素进行升序排序 */ object SparkSortByApplication { def main(args : Array[String]) : Unit = { val conf = new SparkConf().setAppName( "SortSecond" ).setMaster( "local[1]" ) val sc = new SparkContext(conf) val arr = Array(( 1 , 6 , 3 ), ( 2 , 3 , 3 ), ( 1 , 1 , 2 ), ( 1 , 3 , 5 ), ( 2 , 1 , 2 )) val datas 2 = sc.parallelize(arr) val sorts 2 = datas 2 .sortBy(e = > (e. _ 1 ,e. _ 2 )) sorts 2 .collect().foreach(println) sc.stop() } }
输出结果: (1,1,2) (1,3,5) (1,6,3) (2,1,2) (2,3,3)
标签:bb,val,SortBy,sc,new,Spark,排序,sorts 来源: https://www.cnblogs.com/JYB2021/p/16207466.html