Job控制器
作者:互联网
Job控制器用于调配pod对象运行一次性任务,容器中的进程在正常运行结束后不会对其进行重启,而是将pod对象置于“Completed”(完成)状态。若容器中的进程因错误而终止,则需要依配置确定重启与否,未运行完成的pod因其所在的节点故障而意外终止后会被重新调度。
实践中,有的作业任务可能需要运行不止一次,用户可以配置它们以串行或并行的方式运行。总结来说,这种类型的Job控制器对象有两种,具体如下:
1)单工作队列的串行式Job:即以多个一次性的作业方式串行执行多次作业,直至满足期望的次数。这种类型的job也可以理解为并行度为1的作业执行方式,在某个时刻仅存在一个pod资源对象。
2)多工作队列的并行Job:这种方式可以设置工作队列数,即作业数,每个队列仅负责运行一个作业;也可以用有限的工作队列运行较多的作业,即工作队列数少于总作业数,相当于运行多个串行作业队列。
Job控制器常用于管理那些运行一段时间便可“完成”的任务,例如计算或备份操作。
1. Job资源清单说明
查看定义Job资源需要的字段有哪些
[root@k8s-master1 ~]# kubectl explain job KIND: Job VERSION: batch/v1 DESCRIPTION: Job represents the configuration of a single job. FIELDS: apiVersion <string> #当前资源使用的api版本 APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources kind <string> #资源类型 Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds metadata <Object> #元数据,定义Job名字的 Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata spec <Object> #定义容器的 Specification of the desired behavior of a job. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status status <Object> #状态信息,不能改 Current status of a job. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
查看Job的spec字段如何定义
[root@k8s-master1 ~]# kubectl explain job.spec KIND: Job VERSION: batch/v1 RESOURCE: spec <Object> DESCRIPTION: Specification of the desired behavior of a job. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status JobSpec describes how the job execution will look like. FIELDS: activeDeadlineSeconds <integer> #为Job的持续时间,不管有多少Pod创建,一旦工作到指定时间,所有的运行pod都会终止且工作状态将成为type: Failed与reason: DeadlineExceeded Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer backoffLimit <integer> #将作业标记为失败状态之前重试的次数,默认值是6,0表示不允许Pod执行失败。 如果Pod是restartPolicy为Nerver,则失败后会创建新的Pod, 如果是OnFailed,则会重启Pod,不管是哪种情况,只要Pod失败一次就计算一次,而不是等整个Pod失败后再计算一个。 当失败的次数达到该限制时,整个Job随即结束,所有正在运行中的Pod都会被删除 Specifies the number of retries before marking this job failed. Defaults to 6 completions <integer> #标识Job结束所需要成功运行的Pod个数,默认为1 Specifies the desired number of successfully finished pods the job should be run with. Setting to nil means that the success of any pod signals the success of all pods, and allows parallelism to have any positive value. Setting to 1 means that parallelism is limited to 1 and the success of that pod signals the success of the job. More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/ manualSelector <boolean> manualSelector controls generation of pod labels and pod selectors. Leave `manualSelector` unset unless you are certain what you are doing. When false or unset, the system pick labels unique to this job and appends those labels to the pod template. When true, the user is responsible for picking unique labels and specifying the selector. Failure to pick a unique label may cause this and other jobs to not function correctly. However, You may see `manualSelector=true` in jobs that were created with the old `extensions/v1beta1` API. More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#specifying-your-own-pod-selector parallelism <integer> #标识并行运行的Pod个数,默认为1 Specifies the maximum desired number of pods the job should run at any given time. The actual number of pods running in steady state will be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism), i.e. when the work left to do is less than max parallelism. More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/ selector <Object> A label query over pods that should match the pod count. Normally, the system sets this field for you. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors template <Object> -required- Describes the pod that will be created when executing a job. More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/ ttlSecondsAfterFinished <integer> ttlSecondsAfterFinished limits the lifetime of a Job that has finished execution (either Complete or Failed). If this field is set, ttlSecondsAfterFinished after the Job finishes, it is eligible to be automatically deleted. When the Job is being deleted, its lifecycle guarantees (e.g. finalizers) will be honored. If this field is unset, the Job won't be automatically deleted. If this field is set to zero, the Job becomes eligible to be deleted immediately after it finishes. This field is alpha-level and is only honored by servers that enable the TTLAfterFinished feature.
2. 创建Job对象
Job控制器的spec字段内嵌的必要字段仅为template,它的使用方式与Deployment等控制器并无不同。Job会为其Pod对象自动添加“job-name=JOB_NAME”和“controller-uid=UID”标签,并使用标签选择器完车对controller-uid标签关联。下面的资源清单文件定义了一个Job控制器。
用Job这个资源对象来创建一个任务,定一个Job来执行一个倒计时的任务,定义YAML文件。注意Job的RestartPolicy仅支持Never和OnFailure两种,不支持Always,Job就相当于来执行一个批处理任务,执行完就结束了,如果支持Always的话就陷入了死循环了。
[root@k8s-master1 ~]# mkdir job [root@k8s-master1 ~]# cd job/ [root@k8s-master1 job]# ll total 0 [root@k8s-master1 job]# vim job-demo.yaml You have new mail in /var/spool/mail/root [root@k8s-master1 job]# cat job-demo.yaml apiVersion: batch/v1 kind: Job metadata: name: job-demo spec: template: metadata: name: job-demo spec: restartPolicy: Never containers: - name: counter image: busybox:1.28 imagePullPolicy: IfNotPresent command: - "bin/sh" - "-c" - "for i in 9 8 7 6 5 4 3 2 1; do echo $i; done"
使用“kubectl create”或者“kubectl apply”命令完成创建后即可查看相关的任务状态。
[root@k8s-master1 job]# kubectl apply -f job-demo.yaml job.batch/job-demo created You have new mail in /var/spool/mail/root [root@k8s-master1 job]# kubectl get jobs NAME COMPLETIONS DURATION AGE job-demo 1/1 2s 11s [root@k8s-master1 job]# kubectl get pods NAME READY STATUS RESTARTS AGE job-demo-44xgd 0/1 Completed 0 39s
查看job详细信息可以查看到所使用的标签选择器及匹配的pod资源标签,具体如下:
[root@k8s-master1 job]# kubectl describe job job-demo Name: job-demo Namespace: default Selector: controller-uid=e77679ec-6cbd-450e-ab46-c5e4cd89975a Labels: controller-uid=e77679ec-6cbd-450e-ab46-c5e4cd89975a job-name=job-demo Annotations: <none> Parallelism: 1 Completions: 1 Start Time: Sun, 11 Sep 2022 12:48:42 +0800 Completed At: Sun, 11 Sep 2022 12:48:44 +0800 Duration: 2s Pods Statuses: 0 Running / 1 Succeeded / 0 Failed Pod Template: Labels: controller-uid=e77679ec-6cbd-450e-ab46-c5e4cd89975a job-name=job-demo Containers: counter: Image: busybox:1.28 Port: <none> Host Port: <none> Command: bin/sh -c for i in 9 8 7 6 5 4 3 2 1; do echo $i; done Environment: <none> Mounts: <none> Volumes: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 3m59s job-controller Created pod: job-demo-44xgd Normal Completed 3m57s job-controller Job completed
查看pod的详细信息:
[root@k8s-master1 job]# kubectl describe pods job-demo-44xgd Name: job-demo-44xgd Namespace: default Priority: 0 Node: k8s-node1/10.0.0.132 Start Time: Sun, 11 Sep 2022 12:48:42 +0800 Labels: controller-uid=e77679ec-6cbd-450e-ab46-c5e4cd89975a job-name=job-demo Annotations: cni.projectcalico.org/podIP: cni.projectcalico.org/podIPs: Status: Succeeded IP: 10.244.36.80 IPs: IP: 10.244.36.80 Controlled By: Job/job-demo Containers: counter: Container ID: docker://599cd7a64ebe87b06e99bae3a123ffcee1247f7739bfaeb4ed191bec3943711d Image: busybox:1.28 Image ID: docker://sha256:8c811b4aec35f259572d0f79207bc0678df4c736eeec50bc9fec37ed936a472a Port: <none> Host Port: <none> Command: bin/sh -c for i in 9 8 7 6 5 4 3 2 1; do echo $i; done State: Terminated Reason: Completed Exit Code: 0 Started: Sun, 11 Sep 2022 12:48:44 +0800 Finished: Sun, 11 Sep 2022 12:48:44 +0800 Ready: False Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-5n29f (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-5n29f: Type: Secret (a volume populated by a Secret) SecretName: default-token-5n29f Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 63s default-scheduler Successfully assigned default/job-demo-44xgd to k8s-node1 Normal Pulled 62s kubelet Container image "busybox:1.28" already present on machine Normal Created 62s kubelet Created container counter Normal Started 61s kubelet Started container counter
3. 并行式job
将并行度属性job.spec.parallelism的值设置为1,并设置总任务数job.spec.completions属性便能够让job控制器以串行方式运行多任务。下面是一个串行运行5次任务的job控制器示例:
[root@k8s-master1 job]# vim job-multi.yaml You have new mail in /var/spool/mail/root [root@k8s-master1 job]# cat job-multi.yaml apiVersion: batch/v1 kind: Job metadata: name: job-multi spec: completions: 5 template: metadata: name: job-multi spec: restartPolicy: OnFailure containers: - name: myjob image: alpine imagePullPolicy: IfNotPresent command: - "bin/sh" - "-c" - "sleep 120"
使用“kubectl create”或者“kubectl apply”命令完成创建后即可查看相关的任务状态
[root@k8s-master1 job]# kubectl create -f job-multi.yaml job.batch/job-multi created You have new mail in /var/spool/mail/root [root@k8s-master1 job]# kubectl get jobs NAME COMPLETIONS DURATION AGE job-multi 0/5 34s 34s
在另外一个终端监控pod变动过程,了解其执行的过程,休眠120秒之后,又创建一个pod资源,依次创建,直至创建到5个pod对象为止。
[root@k8s-master1 job]# kubectl get pods -w NAME READY STATUS RESTARTS AGE job-multi-nzlkp 0/1 Pending 0 0s job-multi-nzlkp 0/1 Pending 0 0s job-multi-nzlkp 0/1 ContainerCreating 0 0s job-multi-nzlkp 0/1 ContainerCreating 0 1s job-multi-nzlkp 1/1 Running 0 7s job-multi-nzlkp 0/1 Completed 0 2m7s job-multi-ss8rq 0/1 Pending 0 0s job-multi-ss8rq 0/1 Pending 0 0s job-multi-ss8rq 0/1 ContainerCreating 0 0s job-multi-nzlkp 0/1 Completed 0 2m7s job-multi-ss8rq 0/1 ContainerCreating 0 1s job-multi-ss8rq 1/1 Running 0 2s job-multi-ss8rq 0/1 Completed 0 2m2s job-multi-5hhgx 0/1 Pending 0 0s job-multi-5hhgx 0/1 Pending 0 0s job-multi-5hhgx 0/1 ContainerCreating 0 0s job-multi-ss8rq 0/1 Completed 0 2m2s job-multi-5hhgx 0/1 ContainerCreating 0 1s job-multi-5hhgx 1/1 Running 0 2s job-multi-5hhgx 0/1 Completed 0 2m3s job-multi-c4gc8 0/1 Pending 0 0s job-multi-c4gc8 0/1 Pending 0 0s job-multi-c4gc8 0/1 ContainerCreating 0 0s job-multi-5hhgx 0/1 Completed 0 2m3s job-multi-c4gc8 0/1 ContainerCreating 0 1s job-multi-c4gc8 1/1 Running 0 2s job-multi-c4gc8 0/1 Completed 0 2m2s job-multi-dkm54 0/1 Pending 0 0s job-multi-dkm54 0/1 Pending 0 0s job-multi-dkm54 0/1 ContainerCreating 0 0s job-multi-c4gc8 0/1 Completed 0 2m2s job-multi-dkm54 0/1 ContainerCreating 0 1s job-multi-dkm54 1/1 Running 0 2s job-multi-dkm54 0/1 Completed 0 2m2s job-multi-dkm54 0/1 Completed 0 2m2s
最终完成5个pod任务作业。
[root@k8s-master1 job]# kubectl get jobs NAME COMPLETIONS DURATION AGE job-multi 5/5 10m 14m [root@k8s-master1 job]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES job-multi-5hhgx 0/1 Completed 0 10m 10.244.36.77 k8s-node1 <none> <none> job-multi-c4gc8 0/1 Completed 0 8m14s 10.244.36.82 k8s-node1 <none> <none> job-multi-dkm54 0/1 Completed 0 6m12s 10.244.36.74 k8s-node1 <none> <none> job-multi-nzlkp 0/1 Completed 0 14m 10.244.36.81 k8s-node1 <none> <none> job-multi-ss8rq 0/1 Completed 0 12m 10.244.36.78 k8s-node1 <none> <none>
将并行度属性job.spec.parallelism的值能够定义作业执行的并行度,将其设置为2或者以上的值即可实现并行多队列作业运行。同时,如果job.spec.completions使用的是默认值为1,则表示并行度即作业总数;而如果将job.spec.completions属性值设置为大于job.spec.parallelism属性值,则表示使用多队列串行任务作业模式。例如:某控制器配置中的spec字段嵌套如下属性,表示以2个队列并行的方式,总共运行3次的作业。
[root@k8s-master1 job]# vim job-multi.yaml You have new mail in /var/spool/mail/root [root@k8s-master1 job]# cat job-multi.yaml apiVersion: batch/v1 kind: Job metadata: name: job-multi spec: parallelism: 2 completions: 3 template: metadata: name: job-multi spec: restartPolicy: OnFailure containers: - name: myjob image: alpine imagePullPolicy: IfNotPresent command: - "bin/sh" - "-c" - "sleep 20"
创建job控制器资源,并查看pod变动过程
[root@k8s-master1 job]# kubectl apply -f job-multi.yaml job.batch/job-multi created [root@k8s-master1 job]# kubectl get jobs NAME COMPLETIONS DURATION AGE job-multi 0/3 8s 8s [root@k8s-master1 job]# kubectl get pods NAME READY STATUS RESTARTS AGE job-multi-84ldk 1/1 Running 0 14s job-multi-8kh2k 1/1 Running 0 14s [root@k8s-master1 job]# kubectl get jobs NAME COMPLETIONS DURATION AGE job-multi 3/3 46s 65s [root@k8s-master1 job]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES job-multi-84ldk 0/1 Completed 0 67s 10.244.36.85 k8s-node1 <none> <none> job-multi-8kh2k 0/1 Completed 0 67s 10.244.36.75 k8s-node1 <none> <none> job-multi-gz9cs 0/1 Completed 0 44s 10.244.36.83 k8s-node1 <none> <none>
先并行创建两个pod副本数,任务运行完成,再创建第三个pod,完成第三个作业任务。
[root@k8s-master1 job]# kubectl get pods -w NAME READY STATUS RESTARTS AGE job-multi-84ldk 0/1 Pending 0 0s job-multi-8kh2k 0/1 Pending 0 0s job-multi-8kh2k 0/1 Pending 0 0s job-multi-84ldk 0/1 Pending 0 0s job-multi-8kh2k 0/1 ContainerCreating 0 0s job-multi-84ldk 0/1 ContainerCreating 0 0s job-multi-8kh2k 0/1 ContainerCreating 0 2s job-multi-84ldk 0/1 ContainerCreating 0 2s job-multi-84ldk 1/1 Running 0 3s job-multi-8kh2k 1/1 Running 0 3s job-multi-84ldk 0/1 Completed 0 23s job-multi-gz9cs 0/1 Pending 0 0s job-multi-gz9cs 0/1 Pending 0 0s job-multi-8kh2k 0/1 Completed 0 23s job-multi-gz9cs 0/1 ContainerCreating 0 0s job-multi-84ldk 0/1 Completed 0 23s job-multi-8kh2k 0/1 Completed 0 23s job-multi-gz9cs 0/1 ContainerCreating 0 1s job-multi-gz9cs 1/1 Running 0 2s job-multi-gz9cs 0/1 Completed 0 23s job-multi-gz9cs 0/1 Completed 0 23s
Job控制器的job.spec.parallelism定义的并行度表示同时运行的pod对象数,此属性值支持运行时调整从而改变其队列总数,根据工作节点机器资源可用量,适度提高job的并行度,能够大大提升其完成效率,缩短运行时间。
4. 删除job控制器
job控制器待其pod资源运行完成后,将不再占用系统资源。用户可按需保留或使用资源删除命令将其删除。
[root@k8s-master1 job]# kubectl get jobs -o wide NAME COMPLETIONS DURATION AGE CONTAINERS IMAGES SELECTOR job-multi 3/3 3m27s 4m21s myjob alpine controller-uid=b1975181-8673-4bf3-ba75-54c85f5fa627 [root@k8s-master1 job]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES job-multi-4cvrl 0/1 Completed 0 3m8s 10.244.36.88 k8s-node1 <none> <none> job-multi-68pzp 0/1 Completed 0 4m34s 10.244.36.86 k8s-node1 <none> <none> job-multi-h97ds 0/1 Completed 0 4m34s 10.244.36.84 k8s-node1 <none> <none> You have new mail in /var/spool/mail/root [root@k8s-master1 job]# kubectl delete jobs job-multi job.batch "job-multi" deleted [root@k8s-master1 job]# kubectl get jobs -o wide No resources found in default namespace. [root@k8s-master1 job]# kubectl get pods -o wide No resources found in default namespace.
如果某个job控制器的容器应用总是无法正常结束运行,而其restartPolicy又定为重启,则它可能会一直处于不停的重启和错误的循环当中。所幸的是,Job控制器提供了两个属性用于抑制这种情况的发生。具体如下:
job.spec.activeDeadlineSeconds:job的deadline,用于为其指定最大活动时间长度,超过此时长的作业将被终止。
job.spec.backoffLimit:将作业标记为失败状态之前的重试次数,默认值为6。
标签:master1,multi,Job,job,k8s,root,控制器 来源: https://www.cnblogs.com/jiawei2527/p/16677957.html