首页 > 编程语言> > NodeManager Recovery特性对Tez任务的影响

NodeManager Recovery特性对Tez任务的影响

2021-09-28 10:59:48 作者：互联网

NodeManager退出对ResourceManager的影响

在直接kill nodemanager进程或执行/opt/hadoop/bin/yarn --daemon stop nodemanager，并且没有设置recovery时，resourcemanager上状态改为shutdown。

NodeManager部分

NodeManager.nodeManagerShutdownHook
    ->NodeStatusUpdaterImpl.serviceStop() (检测isNMUnderSupervisionWithRecoveryEnabled == false，
    判断有没有设置recovery，设置了就直接返回）
    ->NodeStatusUpdaterImpl.unRegisterNM()
    ->ResourceTrackerPBServiceImpl.unRegisterNodeManager()

ResourceManager部分

ResourceTrackerPBServiceImpl.unRegisterNodeManager()
    ->ResourceTrackerService.unRegisterNodeManager()

在docker stop时，nodemanager不能触发shutdownhook，等到yarn.nm.liveness-monitor.expiry-interval-ms值达到时，resourcemanager上状态改为lost。（需要等到超时时间）

Resourcemanager部分

NMLivelinessMonitor.expire()
    ->RMNodeImpl.handle()
    ->stateMachine.doTransition()
    ->StatusUpdateWhenHealthyTransition.transition()->RMNodeImpl.reportNodeUnusable()
    ->CapacityScheduler.handle(NODE_REMOVED)
    ->CapacityScheduler.removeNode()
    ->触发container关闭

NodeManager退出对Tez的影响

在设置recovery后，硬盘上的container任务中间数据不会在nodemanger退出后进行清理。
nodemanager进程退出后，contianer进程继续执行，container中不分配新的任务后由applicationMaster调度退出，此时如果有中间数据会写入中间数据文件。
中间文件被reducer拉取时依赖nodemanager进程的tez ShuffleHandler的http服务，如果在container完成后nodemanager还是没有启动成功，reducer端就会拉取失败，这时有两个位置触发下一步的变化：

NodeManager启动失败后

达到Resourcemanager的NMLivelinessMonitor超时检测阈值，触发applicationMaster将这个Nodemanager改为Unhealthy状态。这之后unhealthy nodemanager上的任务全部标记为失败，调度到新的node上重新执行。

生产者
ApplicationMaster部分

AMRMClientAsyncImpl.HeartbeatThread.run()
    ->AMRMClientImpl.allocate()
    ->ApplicationMasterProtocolPBClientImpl.allocate()
    ->ApplicationMasterProtocolPBServiceImpl.allocate()

ResourceManager部分

ApplicationMasterProtocolPBServiceImpl.allocate()
    ->ApplicationMasterService.allocate()

消费者
ApplicationMaster部分

AMRMClientAsyncImpl.CallbackHandlerThread.run()   
    ->YarnTaskSchedulerService.onNodesUpdated()   
    ->TaskSchedulerContextImpl.nodesUpdated()   
    ->TaskSchedulerManager.nodesUpdated()   
    ->EventHandler.handle(new AMNodeEventStateChanged())   
    ->AMContainerImpl.handle()   
    ->AMContainerImpl.stateMachine

没有达到Resourcemanager的NMLivelinessMonitor超时检测阈值前，ApplicationMaster、Map不受影响。Reducer部分在VertexImpl类中对fetch进行了失败检测，对fetch的失败次数，失败比例，最大失败时间进行了限制，当触发阈值后，该Reducer失败，ApplicationMaster重新启动一个新Reducer。如果新Reducer再次失败就继续重新调度，直到达到阈值后tez.am.task.max.failed.attempts后，整个hive任务失败。

VertexImpl:4705行

this.maxFailedTaskAttempts = conf.getInt(TezConfiguration.TEZ_AM_TASK_MAX_FAILED_ATTEMPTS,
    TezConfiguration.TEZ_AM_TASK_MAX_FAILED_ATTEMPTS_DEFAULT);
this.taskRescheduleHigherPriority =
    conf.getBoolean(TezConfiguration.TEZ_AM_TASK_RESCHEDULE_HIGHER_PRIORITY,
        TezConfiguration.TEZ_AM_TASK_RESCHEDULE_HIGHER_PRIORITY_DEFAULT);
this.taskRescheduleRelaxedLocality =
    conf.getBoolean(TezConfiguration.TEZ_AM_TASK_RESCHEDULE_RELAXED_LOCALITY,
        TezConfiguration.TEZ_AM_TASK_RESCHEDULE_RELAXED_LOCALITY_DEFAULT);

this.maxAllowedOutputFailures = conf.getInt(TezConfiguration
    .TEZ_TASK_MAX_ALLOWED_OUTPUT_FAILURES, TezConfiguration
    .TEZ_TASK_MAX_ALLOWED_OUTPUT_FAILURES_DEFAULT);

this.maxAllowedOutputFailuresFraction = conf.getDouble(TezConfiguration
    .TEZ_TASK_MAX_ALLOWED_OUTPUT_FAILURES_FRACTION, TezConfiguration
    .TEZ_TASK_MAX_ALLOWED_OUTPUT_FAILURES_FRACTION_DEFAULT);

this.maxAllowedTimeForTaskReadErrorSec = conf.getInt(
    TezConfiguration.TEZ_AM_MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC,
    TezConfiguration.TEZ_AM_MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC_DEFAULT);

标签：TASK,Recovery,TezConfiguration,AM,TEZ,Tez,MAX,NodeManager,nodemanager
来源： https://blog.csdn.net/wangyu2016/article/details/120524796