Spark 源码解析:彻底理解TaskScheduler的任务提交和task最佳位置算法
作者:互联网
本文继续分析Stage被封装成TaskSet,并将TaskSet提交到集群的Executor执行的过程
在DAGScheduler的submitStage方法中,将Stage划分完成,生成拓扑结构,当一个stage没有父stage时候,会调用DAGScheduler的submitMissingTasks方法来提交该stage包含tasks。
首先来分析一下DAGScheduler的submitMissingTasks方法
1.获取Task的最佳计算位置:
核心是其中的getPreferredLocs方法,根据RDD的数据信息得到task的最佳计算位置,从而获取较好的数据本地性。其中的细节这里先跳过,在以后的文章在做分析
2.序列化Task的Binary,并进行广播。Executor端在执行task时会向反序列化Task。
3. 根据stage的不同类型创建,为stage的每个分区创建创建task,并封装成TaskSet。
Stage分两种类型ShuffleMapStage生成ShuffleMapTask,ResultStage生成ResultTask。
val tasks: Seq[Task[_]] = try {
val serializedTaskMetrics = closureSerializer.serialize(stage.latestInfo.taskMetrics).array()
stage match {
// ShuffleMapStage生成ShuffleMapTask
case stage: ShuffleMapStage =>
stage.pendingPartitions.clear()
partitionsToCompute.map { id =>
val locs = taskIdToLocations(id)
val part = partitions(id)
stage.pendingPartitions += id
new ShuffleMapTask(stage.id, stage.latestInfo.attemptNumber,
taskBinary, part, locs, properties, serializedTaskMetrics, Option(jobId),
Option(sc.applicationId), sc.applicationAttemptId)
}
// ResultStage生成ResultTask
case stage: ResultStage =>
partitionsToCompute.map { id =>
val p: Int = stage.partitions(id)
val part = partitions(p)
val locs = taskIdToLocations(id)
new ResultTask(stage.id, stage.latestInfo.attemptNumber,
taskBinary, part, locs, id, properties, serializedTaskMetrics,
Option(jobId), Option(sc.applicationId), sc.applicationAttemptId)
}
}
}
4.调用TaskScheduler的submitTasks,提交TaskSet
logInfo(s"Submitting ${tasks.size} missing tasks from $stage (${stage.rdd}) (first 15 " +
s"tasks are for partitions ${tasks.take(15).map(_.partitionId)})")
taskScheduler.submitTasks(new TaskSet(
tasks.toArray, stage.id, stage.latestInfo.attemptNumber, jobId, properties))
stage.latestInfo.submissionTime = Some(clock.getTimeMillis())
submitTasks方法的实现在TaskScheduler的实现类TaskSchedulerImpl中。
4.1 TaskSchedulerImpl的submitTasks方法首先创建TaskSetManager。
TaskSetManager负责管理TaskSchedulerImpl中一个单独TaskSet,跟踪每一个task,如果task失败,负责重试task直到达到task重试次数的最多次数。并且通过延迟调度来执行task的位置感知调度。
private[spark] class TaskSetManager(
sched: TaskSchedulerImpl, //绑定的TaskSchedulerImpl
val taskSet: TaskSet,
val maxTaskFailures: Int, //失败最大重试次数
clock: Clock = new SystemClock())
extends Schedulable with Logging
4.2 将TaskSetManger加入schedulableBuilder
schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties)
schedulableBuilder的类型是 SchedulerBuilder,SchedulerBuilder是一个trait,有两个实现FIFOSchedulerBuilder和 FairSchedulerBuilder,并且默认采用的是FIFO方式
而schedulableBuilder的创建是在SparkContext创建SchedulerBackend和TaskScheduler后调用TaskSchedulerImpl的初始化方法进行创建的。
def initialize(backend: SchedulerBackend) {
this.backend = backend
schedulableBuilder = {
schedulingMode match {
case SchedulingMode.FIFO =>
new FIFOSchedulableBuilder(rootPool)
case SchedulingMode.FAIR =>
new FairSchedulableBuilder(rootPool, conf)
case _ =>
throw new IllegalArgumentException(s"Unsupported $SCHEDULER_MODE_PROPERTY: " +
s"$schedulingMode")
}
}
schedulableBuilder.buildPools()
}
schedulableBuilder是TaskScheduler中一个重要成员,他根据调度策略决定了TaskSetManager的调度顺序。
4.3 接下来调用SchedulerBackend的riviveOffers方法对Task进行调度,决定task具体运行在哪个Executor中。
调用CoarseGrainedSchedulerBackend的riviveOffers方法,该方法给driverEndpoint发送ReviveOffer消息
override def reviveOffers() {
driverEndpoint.send(ReviveOffers)
}
driverEndpoint收到ReviveOffer消息后调用makeOffers方法:
// Make fake resource offers on all executors
private def makeOffers() {
//过滤出活跃状态的Executor
val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
//将Executor封装成 WorkerOffer 对象
val workOffers = activeExecutors.map { case (id, executorData) =>
new WorkerOffer(id, executorData.executorHost, executorData.freeCores)
}.toSeq
launchTasks(scheduler.resourceOffers(workOffers))
}
注意:上面代码中的executorDataMap,在客户端向Master注册Application的时候,Master已经为Application分配并启动好Executor,然后注册给CoarseGrainedSchedulerBackend,注册信息就是存储在executorDataMap数据结构中。
准备好计算资源后,接下来TaskSchedulerImpl基于这些计算资源为task分配Executor。
我们看一下TaskSchedulerImpl的resourceOffers方法:
// 随机打乱offers
val shuffledOffers = Random.shuffle(offers)
// 构建一个二维数组,保存每个Executor上将要分配的那些task
val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores))
val availableCpus = shuffledOffers.map(o => o.cores).toArray
// 根据SchedulerBuilder的调度算法,给TaskManager排好序
val sortedTaskSets = rootPool.getSortedTaskSetQueue
for (taskSet <- sortedTaskSets) {
logDebug("parentName: %s, name: %s, runningTasks: %s".format(
taskSet.parent.name, taskSet.name, taskSet.runningTasks))
if (newExecAvail) {
taskSet.executorAdded()
}
}
// 使用双重循环,对每一个taskset 依照调度的顺序,依次按照本地性级别顺序尝试启动task
// 数据本地性级别顺序: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY
var launchedTask = false
for (taskSet <- sortedTaskSets; maxLocality <- taskSet.myLocalityLevels) {
do {
launchedTask = resourceOfferSingleTaskSet(
taskSet, maxLocality, shuffledOffers, availableCpus, tasks)
} while (launchedTask)
}
if (tasks.size > 0) {
hasLaunchedTask = true
}
return tasks
下面看看 resourceOfferSingleTaskSet方法:
用当前的数据本地性,调用TaskSetManager的resourceOffer方法,在当前executor上分配task
private def resourceOfferSingleTaskSet(
taskSet: TaskSetManager,
maxLocality: TaskLocality,
shuffledOffers: Seq[WorkerOffer],
availableCpus: Array[Int],
tasks: Seq[ArrayBuffer[TaskDescription]]) : Boolean = {
var launchedTask = false
for (i <- 0 until shuffledOffers.size) {
val execId = shuffledOffers(i).executorId
val host = shuffledOffers(i).host
//如果executor 的cup数大于 每个task的cup数目(值为1)
if (availableCpus(i) >= CPUS_PER_TASK) {
try {
//
for (task <- taskSet.resourceOffer(execId, host, maxLocality)) {
tasks(i) += task
val tid = task.taskId
taskIdToTaskSetManager(tid) = taskSet
taskIdToExecutorId(tid) = execId
executorIdToTaskCount(execId) += 1
executorsByHost(host) += execId
availableCpus(i) -= CPUS_PER_TASK
assert(availableCpus(i) >= 0)
launchedTask = true
}
}
为Task分配好资源之后,DriverEndpint调用launchTask方法将task在Executor上启动运行。task在Executor上的启动运行过程,在后面的文章中会继续分析,敬请关注。
总结一下调用过程:
TaskSchedulerImpl#submitTasks
CoarseGrainedSchedulerBackend#riviveOffers
CoarseGrainedSchedulerBackendDriverEndpoint#makeOffers |-TaskSchedulerImpl#resourceOffers(offers) 为offers分配task |- TaskSchedulerImpl#resourceOfferSingleTaskSet CoarseGrainedSchedulerBackendDriverEndpoint#launchTask
标签:task,val,TaskScheduler,TaskSchedulerImpl,源码,Executor,id,stage 来源: https://blog.csdn.net/weixin_33979363/article/details/87430120