其他分享
首页 > 其他分享> > Job控制器

Job控制器

作者:互联网

  Job控制器用于调配pod对象运行一次性任务,容器中的进程在正常运行结束后不会对其进行重启,而是将pod对象置于“Completed”(完成)状态。若容器中的进程因错误而终止,则需要依配置确定重启与否,未运行完成的pod因其所在的节点故障而意外终止后会被重新调度。

  实践中,有的作业任务可能需要运行不止一次,用户可以配置它们以串行或并行的方式运行。总结来说,这种类型的Job控制器对象有两种,具体如下:

  1)单工作队列的串行式Job:即以多个一次性的作业方式串行执行多次作业,直至满足期望的次数。这种类型的job也可以理解为并行度为1的作业执行方式,在某个时刻仅存在一个pod资源对象。

  2)多工作队列的并行Job:这种方式可以设置工作队列数,即作业数,每个队列仅负责运行一个作业;也可以用有限的工作队列运行较多的作业,即工作队列数少于总作业数,相当于运行多个串行作业队列。

  Job控制器常用于管理那些运行一段时间便可“完成”的任务,例如计算或备份操作。

1. Job资源清单说明

  查看定义Job资源需要的字段有哪些

[root@k8s-master1 ~]# kubectl explain job
KIND:     Job
VERSION:  batch/v1

DESCRIPTION:
     Job represents the configuration of a single job.

FIELDS:
   apiVersion   <string> #当前资源使用的api版本
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

   kind <string>  #资源类型
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

   metadata     <Object> #元数据,定义Job名字的
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec <Object> #定义容器的
     Specification of the desired behavior of a job. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

   status       <Object> #状态信息,不能改
     Current status of a job. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

  查看Job的spec字段如何定义

[root@k8s-master1 ~]# kubectl explain job.spec
KIND:     Job
VERSION:  batch/v1

RESOURCE: spec <Object>

DESCRIPTION:
     Specification of the desired behavior of a job. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

     JobSpec describes how the job execution will look like.

FIELDS:
   activeDeadlineSeconds        <integer>   #为Job的持续时间,不管有多少Pod创建,一旦工作到指定时间,所有的运行pod都会终止且工作状态将成为type: Failed与reason: DeadlineExceeded
     Specifies the duration in seconds relative to the startTime that the job
     may be active before the system tries to terminate it; value must be
     positive integer

   backoffLimit <integer>  #将作业标记为失败状态之前重试的次数,默认值是6,0表示不允许Pod执行失败。
   如果Pod是restartPolicy为Nerver,则失败后会创建新的Pod,
   如果是OnFailed,则会重启Pod,不管是哪种情况,只要Pod失败一次就计算一次,而不是等整个Pod失败后再计算一个。
   当失败的次数达到该限制时,整个Job随即结束,所有正在运行中的Pod都会被删除
     Specifies the number of retries before marking this job failed. Defaults to
     6

   completions  <integer>  #标识Job结束所需要成功运行的Pod个数,默认为1
     Specifies the desired number of successfully finished pods the job should
     be run with. Setting to nil means that the success of any pod signals the
     success of all pods, and allows parallelism to have any positive value.
     Setting to 1 means that parallelism is limited to 1 and the success of that
     pod signals the success of the job. More info:
     https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

   manualSelector       <boolean>
     manualSelector controls generation of pod labels and pod selectors. Leave
     `manualSelector` unset unless you are certain what you are doing. When
     false or unset, the system pick labels unique to this job and appends those
     labels to the pod template. When true, the user is responsible for picking
     unique labels and specifying the selector. Failure to pick a unique label
     may cause this and other jobs to not function correctly. However, You may
     see `manualSelector=true` in jobs that were created with the old
     `extensions/v1beta1` API. More info:
     https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#specifying-your-own-pod-selector

   parallelism  <integer>  #标识并行运行的Pod个数,默认为1
     Specifies the maximum desired number of pods the job should run at any
     given time. The actual number of pods running in steady state will be less
     than this number when ((.spec.completions - .status.successful) <
     .spec.parallelism), i.e. when the work left to do is less than max
     parallelism. More info:
     https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

   selector     <Object>
     A label query over pods that should match the pod count. Normally, the
     system sets this field for you. More info:
     https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors

   template     <Object> -required-
     Describes the pod that will be created when executing a job. More info:
     https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

   ttlSecondsAfterFinished      <integer>
     ttlSecondsAfterFinished limits the lifetime of a Job that has finished
     execution (either Complete or Failed). If this field is set,
     ttlSecondsAfterFinished after the Job finishes, it is eligible to be
     automatically deleted. When the Job is being deleted, its lifecycle
     guarantees (e.g. finalizers) will be honored. If this field is unset, the
     Job won't be automatically deleted. If this field is set to zero, the Job
     becomes eligible to be deleted immediately after it finishes. This field is
     alpha-level and is only honored by servers that enable the TTLAfterFinished
     feature.

2. 创建Job对象

  Job控制器的spec字段内嵌的必要字段仅为template,它的使用方式与Deployment等控制器并无不同。Job会为其Pod对象自动添加“job-name=JOB_NAME”和“controller-uid=UID”标签,并使用标签选择器完车对controller-uid标签关联。下面的资源清单文件定义了一个Job控制器。

  用Job这个资源对象来创建一个任务,定一个Job来执行一个倒计时的任务,定义YAML文件。注意Job的RestartPolicy仅支持Never和OnFailure两种,不支持Always,Job就相当于来执行一个批处理任务,执行完就结束了,如果支持Always的话就陷入了死循环了。

[root@k8s-master1 ~]# mkdir job
[root@k8s-master1 ~]# cd job/
[root@k8s-master1 job]# ll
total 0
[root@k8s-master1 job]# vim job-demo.yaml
You have new mail in /var/spool/mail/root
[root@k8s-master1 job]# cat job-demo.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: job-demo
spec:
  template:
    metadata:
      name: job-demo
    spec:
      restartPolicy: Never
      containers:
      - name: counter
        image: busybox:1.28
        imagePullPolicy: IfNotPresent
        command:
        - "bin/sh"
        - "-c"
        - "for i in 9 8 7 6 5 4 3 2 1; do echo $i; done"

  使用“kubectl create”或者“kubectl apply”命令完成创建后即可查看相关的任务状态。

[root@k8s-master1 job]# kubectl apply -f job-demo.yaml
job.batch/job-demo created
You have new mail in /var/spool/mail/root
[root@k8s-master1 job]# kubectl get jobs
NAME       COMPLETIONS   DURATION   AGE
job-demo   1/1           2s         11s
[root@k8s-master1 job]# kubectl get pods
NAME             READY   STATUS      RESTARTS   AGE
job-demo-44xgd   0/1     Completed   0          39s

  查看job详细信息可以查看到所使用的标签选择器及匹配的pod资源标签,具体如下:

[root@k8s-master1 job]# kubectl describe job job-demo
Name:           job-demo
Namespace:      default
Selector:       controller-uid=e77679ec-6cbd-450e-ab46-c5e4cd89975a
Labels:         controller-uid=e77679ec-6cbd-450e-ab46-c5e4cd89975a
                job-name=job-demo
Annotations:    <none>
Parallelism:    1
Completions:    1
Start Time:     Sun, 11 Sep 2022 12:48:42 +0800
Completed At:   Sun, 11 Sep 2022 12:48:44 +0800
Duration:       2s
Pods Statuses:  0 Running / 1 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=e77679ec-6cbd-450e-ab46-c5e4cd89975a
           job-name=job-demo
  Containers:
   counter:
    Image:      busybox:1.28
    Port:       <none>
    Host Port:  <none>
    Command:
      bin/sh
      -c
      for i in 9 8 7 6 5 4 3 2 1; do echo $i; done
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age    From            Message
  ----    ------            ----   ----            -------
  Normal  SuccessfulCreate  3m59s  job-controller  Created pod: job-demo-44xgd
  Normal  Completed         3m57s  job-controller  Job completed

  查看pod的详细信息:

[root@k8s-master1 job]# kubectl describe pods job-demo-44xgd
Name:         job-demo-44xgd
Namespace:    default
Priority:     0
Node:         k8s-node1/10.0.0.132
Start Time:   Sun, 11 Sep 2022 12:48:42 +0800
Labels:       controller-uid=e77679ec-6cbd-450e-ab46-c5e4cd89975a
              job-name=job-demo
Annotations:  cni.projectcalico.org/podIP:
              cni.projectcalico.org/podIPs:
Status:       Succeeded
IP:           10.244.36.80
IPs:
  IP:           10.244.36.80
Controlled By:  Job/job-demo
Containers:
  counter:
    Container ID:  docker://599cd7a64ebe87b06e99bae3a123ffcee1247f7739bfaeb4ed191bec3943711d
    Image:         busybox:1.28
    Image ID:      docker://sha256:8c811b4aec35f259572d0f79207bc0678df4c736eeec50bc9fec37ed936a472a
    Port:          <none>
    Host Port:     <none>
    Command:
      bin/sh
      -c
      for i in 9 8 7 6 5 4 3 2 1; do echo $i; done
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 11 Sep 2022 12:48:44 +0800
      Finished:     Sun, 11 Sep 2022 12:48:44 +0800
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-5n29f (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-5n29f:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-5n29f
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  63s   default-scheduler  Successfully assigned default/job-demo-44xgd to k8s-node1
  Normal  Pulled     62s   kubelet            Container image "busybox:1.28" already present on machine
  Normal  Created    62s   kubelet            Created container counter
  Normal  Started    61s   kubelet            Started container counter

3. 并行式job

  将并行度属性job.spec.parallelism的值设置为1,并设置总任务数job.spec.completions属性便能够让job控制器以串行方式运行多任务。下面是一个串行运行5次任务的job控制器示例:

[root@k8s-master1 job]# vim job-multi.yaml
You have new mail in /var/spool/mail/root
[root@k8s-master1 job]# cat job-multi.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: job-multi
spec:
  completions: 5
  template:
    metadata:
      name: job-multi
    spec:
      restartPolicy: OnFailure
      containers:
      - name: myjob
        image: alpine
        imagePullPolicy: IfNotPresent
        command:
        - "bin/sh"
        - "-c"
        - "sleep 120"

  使用“kubectl create”或者“kubectl apply”命令完成创建后即可查看相关的任务状态

[root@k8s-master1 job]# kubectl create -f job-multi.yaml
job.batch/job-multi created
You have new mail in /var/spool/mail/root
[root@k8s-master1 job]# kubectl get jobs
NAME        COMPLETIONS   DURATION   AGE
job-multi   0/5           34s        34s

       在另外一个终端监控pod变动过程,了解其执行的过程,休眠120秒之后,又创建一个pod资源,依次创建,直至创建到5个pod对象为止。

[root@k8s-master1 job]# kubectl get pods -w
NAME              READY   STATUS    RESTARTS   AGE
job-multi-nzlkp   0/1     Pending   0          0s
job-multi-nzlkp   0/1     Pending   0          0s
job-multi-nzlkp   0/1     ContainerCreating   0          0s
job-multi-nzlkp   0/1     ContainerCreating   0          1s
job-multi-nzlkp   1/1     Running             0          7s
job-multi-nzlkp   0/1     Completed           0          2m7s
job-multi-ss8rq   0/1     Pending             0          0s
job-multi-ss8rq   0/1     Pending             0          0s
job-multi-ss8rq   0/1     ContainerCreating   0          0s
job-multi-nzlkp   0/1     Completed           0          2m7s
job-multi-ss8rq   0/1     ContainerCreating   0          1s
job-multi-ss8rq   1/1     Running             0          2s
job-multi-ss8rq   0/1     Completed           0          2m2s
job-multi-5hhgx   0/1     Pending             0          0s
job-multi-5hhgx   0/1     Pending             0          0s
job-multi-5hhgx   0/1     ContainerCreating   0          0s
job-multi-ss8rq   0/1     Completed           0          2m2s
job-multi-5hhgx   0/1     ContainerCreating   0          1s
job-multi-5hhgx   1/1     Running             0          2s
job-multi-5hhgx   0/1     Completed           0          2m3s
job-multi-c4gc8   0/1     Pending             0          0s
job-multi-c4gc8   0/1     Pending             0          0s
job-multi-c4gc8   0/1     ContainerCreating   0          0s
job-multi-5hhgx   0/1     Completed           0          2m3s
job-multi-c4gc8   0/1     ContainerCreating   0          1s
job-multi-c4gc8   1/1     Running             0          2s
job-multi-c4gc8   0/1     Completed           0          2m2s
job-multi-dkm54   0/1     Pending             0          0s
job-multi-dkm54   0/1     Pending             0          0s
job-multi-dkm54   0/1     ContainerCreating   0          0s
job-multi-c4gc8   0/1     Completed           0          2m2s
job-multi-dkm54   0/1     ContainerCreating   0          1s
job-multi-dkm54   1/1     Running             0          2s
job-multi-dkm54   0/1     Completed           0          2m2s
job-multi-dkm54   0/1     Completed           0          2m2s

  最终完成5个pod任务作业。

[root@k8s-master1 job]# kubectl get jobs
NAME        COMPLETIONS   DURATION   AGE
job-multi   5/5           10m        14m
[root@k8s-master1 job]# kubectl get pods -o wide
NAME              READY   STATUS      RESTARTS   AGE     IP             NODE        NOMINATED NODE   READINESS GATES
job-multi-5hhgx   0/1     Completed   0          10m     10.244.36.77   k8s-node1   <none>           <none>
job-multi-c4gc8   0/1     Completed   0          8m14s   10.244.36.82   k8s-node1   <none>           <none>
job-multi-dkm54   0/1     Completed   0          6m12s   10.244.36.74   k8s-node1   <none>           <none>
job-multi-nzlkp   0/1     Completed   0          14m     10.244.36.81   k8s-node1   <none>           <none>
job-multi-ss8rq   0/1     Completed   0          12m     10.244.36.78   k8s-node1   <none>           <none>

  将并行度属性job.spec.parallelism的值能够定义作业执行的并行度,将其设置为2或者以上的值即可实现并行多队列作业运行。同时,如果job.spec.completions使用的是默认值为1,则表示并行度即作业总数;而如果将job.spec.completions属性值设置为大于job.spec.parallelism属性值,则表示使用多队列串行任务作业模式。例如:某控制器配置中的spec字段嵌套如下属性,表示以2个队列并行的方式,总共运行3次的作业。

[root@k8s-master1 job]# vim job-multi.yaml
You have new mail in /var/spool/mail/root
[root@k8s-master1 job]# cat job-multi.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: job-multi
spec:
  parallelism: 2
  completions: 3
  template:
    metadata:
      name: job-multi
    spec:
      restartPolicy: OnFailure
      containers:
      - name: myjob
        image: alpine
        imagePullPolicy: IfNotPresent
        command:
        - "bin/sh"
        - "-c"
        - "sleep 20"

  创建job控制器资源,并查看pod变动过程

[root@k8s-master1 job]# kubectl apply -f job-multi.yaml
job.batch/job-multi created
[root@k8s-master1 job]# kubectl get jobs
NAME        COMPLETIONS   DURATION   AGE
job-multi   0/3           8s         8s
[root@k8s-master1 job]# kubectl get pods
NAME              READY   STATUS    RESTARTS   AGE
job-multi-84ldk   1/1     Running   0          14s
job-multi-8kh2k   1/1     Running   0          14s
[root@k8s-master1 job]# kubectl get jobs
NAME        COMPLETIONS   DURATION   AGE
job-multi   3/3           46s        65s
[root@k8s-master1 job]# kubectl get pods -o wide
NAME              READY   STATUS      RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
job-multi-84ldk   0/1     Completed   0          67s   10.244.36.85   k8s-node1   <none>           <none>
job-multi-8kh2k   0/1     Completed   0          67s   10.244.36.75   k8s-node1   <none>           <none>
job-multi-gz9cs   0/1     Completed   0          44s   10.244.36.83   k8s-node1   <none>           <none>

  先并行创建两个pod副本数,任务运行完成,再创建第三个pod,完成第三个作业任务。

[root@k8s-master1 job]# kubectl get pods -w
NAME              READY   STATUS    RESTARTS   AGE
job-multi-84ldk   0/1     Pending   0          0s
job-multi-8kh2k   0/1     Pending   0          0s
job-multi-8kh2k   0/1     Pending   0          0s
job-multi-84ldk   0/1     Pending   0          0s
job-multi-8kh2k   0/1     ContainerCreating   0          0s
job-multi-84ldk   0/1     ContainerCreating   0          0s
job-multi-8kh2k   0/1     ContainerCreating   0          2s
job-multi-84ldk   0/1     ContainerCreating   0          2s
job-multi-84ldk   1/1     Running             0          3s
job-multi-8kh2k   1/1     Running             0          3s
job-multi-84ldk   0/1     Completed           0          23s
job-multi-gz9cs   0/1     Pending             0          0s
job-multi-gz9cs   0/1     Pending             0          0s
job-multi-8kh2k   0/1     Completed           0          23s
job-multi-gz9cs   0/1     ContainerCreating   0          0s
job-multi-84ldk   0/1     Completed           0          23s
job-multi-8kh2k   0/1     Completed           0          23s
job-multi-gz9cs   0/1     ContainerCreating   0          1s
job-multi-gz9cs   1/1     Running             0          2s
job-multi-gz9cs   0/1     Completed           0          23s
job-multi-gz9cs   0/1     Completed           0          23s

  Job控制器的job.spec.parallelism定义的并行度表示同时运行的pod对象数,此属性值支持运行时调整从而改变其队列总数,根据工作节点机器资源可用量,适度提高job的并行度,能够大大提升其完成效率,缩短运行时间。

4. 删除job控制器

  job控制器待其pod资源运行完成后,将不再占用系统资源。用户可按需保留或使用资源删除命令将其删除。

[root@k8s-master1 job]# kubectl get jobs -o wide
NAME        COMPLETIONS   DURATION   AGE     CONTAINERS   IMAGES   SELECTOR
job-multi   3/3           3m27s      4m21s   myjob        alpine   controller-uid=b1975181-8673-4bf3-ba75-54c85f5fa627
[root@k8s-master1 job]# kubectl get pods -o wide
NAME              READY   STATUS      RESTARTS   AGE     IP             NODE        NOMINATED NODE   READINESS GATES
job-multi-4cvrl   0/1     Completed   0          3m8s    10.244.36.88   k8s-node1   <none>           <none>
job-multi-68pzp   0/1     Completed   0          4m34s   10.244.36.86   k8s-node1   <none>           <none>
job-multi-h97ds   0/1     Completed   0          4m34s   10.244.36.84   k8s-node1   <none>           <none>
You have new mail in /var/spool/mail/root
[root@k8s-master1 job]# kubectl delete jobs job-multi
job.batch "job-multi" deleted
[root@k8s-master1 job]# kubectl get jobs -o wide
No resources found in default namespace.
[root@k8s-master1 job]# kubectl get pods -o wide
No resources found in default namespace.

  如果某个job控制器的容器应用总是无法正常结束运行,而其restartPolicy又定为重启,则它可能会一直处于不停的重启和错误的循环当中。所幸的是,Job控制器提供了两个属性用于抑制这种情况的发生。具体如下:

  job.spec.activeDeadlineSeconds:job的deadline,用于为其指定最大活动时间长度,超过此时长的作业将被终止。

  job.spec.backoffLimit:将作业标记为失败状态之前的重试次数,默认值为6。

标签:master1,multi,Job,job,k8s,root,控制器
来源: https://www.cnblogs.com/jiawei2527/p/16677957.html