spark统计pv和uv值
作者:互联网
文章目录
PV 值:page view
页面浏览量或点击量,是衡量一个网站或网页用户访问量。具体的说,PV 值就是所有访问者在 24 小时(0 点到 24 点)内看了某个网站多少个页面或某个网页多少次。PV 是指页面刷新的次数,每一次页面刷新,就算做一次 PV 流量。
代码逻辑
- 取网址链接mapToPair 产生(网址链接,1)的tuple
- reduceByKey
- foreach打印
SparkConf conf = new SparkConf(); conf.setMaster("local").setAppName("PV"); JavaSparkContext context = new JavaSparkContext(conf); JavaRDD<String> lineRDD = context.textFile("./data/pvuvdata"); lineRDD.mapToPair(new PairFunction<String, String, Integer>() { @Override public Tuple2<String, Integer> call(String line) throws Exception { String string = line.split("\t")[5]; return new Tuple2<>(string,1); } }).reduceByKey(new Function2<Integer, Integer, Integer>() { @Override public Integer call(Integer v1, Integer v2) throws Exception { return v1+v2; } }).foreach(new VoidFunction<Tuple2<String, Integer>>() { @Override public void call(Tuple2<String, Integer> stringIntegerTuple2) throws Exception { System.out.println("网址"+stringIntegerTuple2); } }); lineRDD.mapToPair(new PairFunction<String, String, Integer>() { @Override public Tuple2<String, Integer> call(String line) throws Exception { String string = line.split("\t")[5]; return new Tuple2<>(string,1); } }).groupByKey().foreach(new VoidFunction<Tuple2<String, Iterable<Integer>>>() { @Override public void call(Tuple2<String, Iterable<Integer>> s) throws Exception { int count = 0 ; Iterator<Integer> iterator = s._2.iterator(); while (iterator.hasNext()){ count ++; } System.out.println("url : "+ s._1 + " value " + count); } });
SparkConf conf = new SparkConf(); conf.setMaster("local").setAppName("PV"); JavaSparkContext context = new JavaSparkContext(conf); JavaRDD<String> lineRDD = context.textFile("./data/pvuvdata"); lineRDD.mapToPair(new PairFunction<String, String, Integer>() { @Override public Tuple2<String, Integer> call(String line) throws Exception { String string = line.split("\t")[5]; return new Tuple2<>(string,1); } }).reduceByKey(new Function2<Integer, Integer, Integer>() { @Override public Integer call(Integer v1, Integer v2) throws Exception { return v1+v2; } }).foreach(new VoidFunction<Tuple2<String, Integer>>() { @Override public void call(Tuple2<String, Integer> stringIntegerTuple2) throws Exception { System.out.println("网址"+stringIntegerTuple2); } }); Map<String, Object> map = lineRDD.mapToPair(new PairFunction<String, String, Integer>() { @Override public Tuple2<String, Integer> call(String line) throws Exception { String string = line.split("\t")[5]; return new Tuple2<>(string, 1); } }).countByKey(); for (String key : map.keySet()){ System.out.println("key : "+ key + " value :" + map.get(key)); }
UV (unique visitor )即独立访客数
指访问某个站点或点击某个网页的同 不同 IP 地址的人数。在同一天内,UV 只记录第一次进入网站的具有独立IP 的访问者,在同一天内再次访问该网站则不计数。UV 提供了一定时间内不同观众数量的统计指标,而没有反应出网站的全面活动
SparkConf conf = new SparkConf(); conf.setMaster("local").setAppName("TestUV"); JavaSparkContext context = new JavaSparkContext(conf); JavaRDD<String> lineRDD = context.textFile("./data/pvuvdata"); JavaPairRDD<String, Iterable<String>> rdd1 = lineRDD.mapToPair(new PairFunction<String, String, String>() { @Override public Tuple2<String, String> call(String s) throws Exception { String url = s.split("\t")[5]; String ip = s.split("\t")[0]; return new Tuple2<>(url, ip); } }).groupByKey(); rdd1.foreach(new VoidFunction<Tuple2<String, Iterable<String>>>() { @Override public void call(Tuple2<String, Iterable<String>> t) throws Exception { HashSet<Object> set = new HashSet<>(); Iterator<String> iterator = t._2.iterator(); while (iterator.hasNext()){ set.add(iterator.next()); } System.out.println(" key : "+t._1+ " value :"+set.size()); } });
SparkConf conf = new SparkConf(); conf.setMaster("local").setAppName("TestUV"); JavaSparkContext context = new JavaSparkContext(conf); JavaRDD<String> lineRDD = context.textFile("./data/pvuvdata"); Map<String, Object> map = lineRDD.mapToPair(new PairFunction<String, String, String>() { @Override public Tuple2<String, String> call(String s) throws Exception { String url = s.split("\t")[5]; String ip = s.split("\t")[0]; return new Tuple2<>(url, ip); } }).distinct().countByKey(); for (String key : map.keySet()){ System.out.println(" key : "+ key+ " value: "+ map.get(key)); }
标签:pv,String,uv,throws,Tuple2,call,new,spark,public 来源: https://blog.51cto.com/u_13985831/2836535