其他分享
首页 > 其他分享> > 1.Spark ML学习笔记—Spark MLlib 与 Spark ML、Pipelines 的主要概念

1.Spark ML学习笔记—Spark MLlib 与 Spark ML、Pipelines 的主要概念

作者:互联网

本文目录如下:

第1章 Spark 机器学习简介

1.1 Spark MLlib 与 Spark ML

1.1.1 Spark MLlib

1.1.2 Spark ML (重点)


1.2 Pipelines 的主要概念

1.2.1 Transformer (转换器)

1.2.2 Estimator (模型学习器)

也称为了评估器吗?

1.3 实例: Estimator, Transformer, Param

import org.apache.spark.ml.feature._
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.m1.{ Pipeline, PipelineModel }
import org.apache.spark.ml.param.ParamMap
import org.apache.spark.ml.linalg.{Vector, Vectors}
import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession

// 1.训练样本
val training = spark.createDataFrame(Seq(
  (1.0, Vectors.dense(0.0, 1.1, 0.1)),
  (0.0, Vectors.dense(2.0, 1.0, -1.0)),
  (0.0, Vectors.dense(2.0, 1.3, 1.0)),
  (1.0, Vectors.dense(0.0, 1.2, -0.5)))).toDF("label", "features")

// 2.创建逻辑回归 Estimator
val lr = new LogisticRegression( )
println("LogisticRegression parameters:\n" + lr.explainParams() + "\n")

// 3.通过setter方法设置模型参数
lr.setMaxIter(10)
  .setRegParam(0.01)

// 4.训练模型
val model1 = lr.fit(training)
println("Model 1 was fit using parameters: " + model1.parent.extractParamMap)

// 5.通过ParamMap设置参数方法
val paramMap = ParamMap(lr.maxIter -> 20)
  .put(lr.maxIter, 30)
  .put(lr.regParam -> 0.1, lr.threshold -> 0.55)

// 5.ParamMap合并
val paramMap2 = ParamMap(lr.probabilityCol -> "myProbability")
val paramMapCombined = paramMap ++ paramMap2

// 6.训练模型,采用paramMap参数
// paramMapCombined会覆盖所有lr.set设置的参数
val model2 = lr.fit(training, paramMapCombined)
println("Model 2 was fit using parameters: " + model2.parent.extractParamMap)

// 7.测试样本
val test = spark.createDataFrame(Seq(
  (1.0, Vectors.dense(-1.0, 1.5, 1.3 )),
  (0.0, Vectors.dense(3.0, 2.0, -0.1)),
  (1.0, Vectors.dense(0.0, 2.2, -1.5)))).toDF("label", "features")

// 8.对模型进行测试
model2.transform(test)
  .select("features", "label", "myProbability", "prediction")
  .collect()
  .foreach {
    case Row(features: Vector, label: Double, prob: Vector, prediction: Double) =>
    println(s"($features, $label) -> prob=$prob, prediction=$prediction")
}

标签:Pipelines,val,ML,MLlib,lr,Spark,spark
来源: https://blog.csdn.net/affluent6/article/details/120390881