其他分享
首页 > 其他分享> > 亲测!K8S集群跨节点挂载CephFS(上)

亲测!K8S集群跨节点挂载CephFS(上)

作者:互联网

在Kubernetes集群中运行有状态服务或应用总是不那么容易的。比如,之前我在项目中使用了CephRBD,虽然遇到过几次问题,但总体算是运行良好。但最近发现CephRBD无法满足跨节点挂载的需求,我只好另辟蹊径。由于CephFS和CephRBD师出同门,它自然成为了这次我首要考察的目标。这里将跨节点挂载CephFS的考察过程记录一下,一是备忘,二则也可以为其他有相似需求的朋友提供些资料。


1


CephRBD的问题


这里先提一嘴CephRBD的问题。最近项目中有这样的需求:让集群中的Pod共享外部分布式存储,即多个Pod共同挂载一份存储,实现存储共享,这样可大大简化系统设计和复杂性。之前CephRBD都是挂载到一个Pod中运行的,CephRBD是否支持多Pod同时挂载呢?官方文档中给出了否定的答案: 基于CephRBD的Persistent Volume仅支持两种accessmode:

ReadWriteOnce和ReadOnlyMany,不支持ReadWriteMany。这样对于有读写需求的Pod来说,一个CephRBD pv仅能被一个node挂载一次。


我们来验证一下这个“不幸的”事实。


我们首先创建一个测试用的image:foo1。这里我利用了项目里写的CephRBD API服务,也可通过ceph命令手工创建:


# curl -v  -H "Content-type: application/json" -X POST -d '{"kind": "Images","apiVersion": "v1", "metadata": {"name": "foo1", "capacity": 512} ' http://192.168.3.22:8080/api/v1/pools/rbd/images

... ...

{

  "errcode": 0,

  "errmsg": "ok"

# curl http://192.168.3.22:8080/api/v1/pools/rbd/images

{

  "Kind": "ImagesList",

  "APIVersion": "v1",

  "Items": [

    {

      "name": "foo1"

    }

  ]

}


利用下面文件创建pv和pvc:


//ceph-pv.yaml


apiVersion: v1

kind: PersistentVolume

metadata:

  name: foo-pv

spec:

  capacity:

    storage: 512Mi

  accessModes:

    - ReadWriteMany

  rbd:

    monitors:

      - ceph_monitor_ip:port

    pool: rbd

    image: foo1

    user: admin

    secretRef:

      name: ceph-secret

    fsType: ext4

    readOnly: false

  persistentVolumeReclaimPolicy: Recycle


//ceph-pvc.yaml


kind: PersistentVolumeClaim

apiVersion: v1

metadata:

  name: foo-claim

spec:

  accessModes:

    - ReadWriteMany

  resources:

    requests:

      storage: 512Mi



创建后:


# kubectl get pv

[NAME                CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                        REASON    AGE

foo-pv              512Mi      RWO           Recycle         Bound     default/foo-claim                      20h


# kubectl get pvc

NAME                 STATUS    VOLUME              CAPACITY   ACCESSMODES   AGE

foo-claim            Bound     foo-pv              512Mi      RWO           20h


创建挂载上述image的Pod:


// ceph-pod2.yaml


apiVersion: v1

kind: Pod

metadata:

  name: ceph-pod2

spec:

  containers:

  - name: ceph-ubuntu2

    image: ubuntu:14.04

    command: ["tail", "-f", "/var/log/bootstrap.log"]

    volumeMounts:

    - name: ceph-vol2

      mountPath: /mnt/cephrbd/data

      readOnly: false

  volumes:

  - name: ceph-vol2

    persistentVolumeClaim:

      claimName: foo-claim


创建成功后,我们可以查看挂载目录的数据:


# kubectl exec ceph-pod2 ls /mnt/cephrbd/data

1.txt

lost+found


我们在同一个kubernetes node上再启动一个pod(可以把上面的ceph-pod2.yaml的pod name改为ceph-pod3),挂载同样的pv:


NAMESPACE                    NAME                                    READY     STATUS    RESTARTS   AGE       IP             NODE

default                      ceph-pod2                               1/1       Running   0          3m        172.16.57.9    xx.xx.xx.xx

default                      ceph-pod3                               1/1       Running   0          0s        172.16.57.10    xx.xx.xx.xx


# kubectl exec ceph-pod3 ls /mnt/cephrbd/data

1.txt

lost+found


我们通过ceph-pod2写一个文件,在ceph-pod3中将其读出:


# kubectl exec ceph-pod2 -- bash -c "for i in {1..10}; do sleep 1; echo 'pod2: Hello, World'>> /mnt/cephrbd/data/foo.txt ; done "

root@node1:~/k8stest/k8s-cephrbd/footest# kubectl exec ceph-pod3 cat /mnt/cephrbd/data/foo.txt

pod2: Hello, World

pod2: Hello, World

pod2: Hello, World

pod2: Hello, World

pod2: Hello, World

pod2: Hello, World

pod2: Hello, World

pod2: Hello, World

pod2: Hello, World

pod2: Hello, World


到目前为止,在一个node上多个Pod是可以以ReadWrite模式挂载同一个CephRBD的。


我们在另外一个节点启动一个试图挂载该pv的Pod,该Pod启动后一直处于pending状态,通过kubectl describe查看其详细信息,可以看到:


Events:

  FirstSeen    LastSeen    Count    From            SubobjectPath    Type        Reason        Message

  ---------    --------    -----    ----            -------------    --------    ------        -------

.. ...

  2m        37s        2    {kubelet yy.yy.yy.yy}            Warning        FailedMount    Unable to mount volumes for pod "ceph-pod2-master_default(a45f62aa-2bc3-11e7-9baa-00163e1625a9)": timeout expired waiting for volumes to attach/mount for pod "ceph-pod2-master"/"default". list of unattached/unmounted volumes=[ceph-vol2]

  2m        37s        2    {kubelet yy.yy.yy.yy}            Warning        FailedSync    Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "ceph-pod2-master"/"default". list of unattached/unmounted volumes=[ceph-vol2]


查看kubelet.log中的错误日志:


I0428 11:39:15.737729    1241 reconciler.go:294] MountVolume operation started for volume "kubernetes.io/rbd/a45f62aa-2bc3-11e7-9baa-00163e1625a9-foo-pv" (spec.Name: "foo-pv") to pod "a45f62aa-2bc3-11e7-9baa-00163e1625a9" (UID: "a45f62aa-2bc3-11e7-9baa-00163e1625a9").

I0428 11:39:15.939183    1241 operation_executor.go:768] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/923700ff-12c2-11e7-9baa-00163e1625a9-default-token-40z0x" (spec.Name: "default-token-40z0x") pod "923700ff-12c2-11e7-9baa-00163e1625a9" (UID: "923700ff-12c2-11e7-9baa-00163e1625a9").


可以看到“rbd: image foo1 is locked by other nodes”的日志。我们用试验证明了目前CephRBD仅能被k8s中的一个node挂载的事实。


2


Ceph集群安装mds以支持CephFS


这次我在两个Ubuntu 16.04的vm上新部署了一套Ceph,过程与之前第一次部署Ceph时大同小异,这里就不赘述了。要让Ceph支持CephFS,我们需要安装mds组件,有了前面的基础,通过ceph-deploy工具安装mds十分简单:

# ceph-deploy mds create yypdmaster yypdnode

[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf

[ceph_deploy.cli][INFO  ] Invoked (1.5.37): /usr/bin/ceph-deploy mds create yypdmaster yypdnode

[ceph_deploy.cli][INFO  ] ceph-deploy options:

[ceph_deploy.cli][INFO  ]  username                      : None

[ceph_deploy.cli][INFO  ]  verbose                       : False

[ceph_deploy.cli][INFO  ]  overwrite_conf                : False

[ceph_deploy.cli][INFO  ]  subcommand                    : create

[ceph_deploy.cli][INFO  ]  quiet                         : False

[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f60fb5e71b8>

[ceph_deploy.cli][INFO  ]  cluster                       : ceph

[ceph_deploy.cli][INFO  ]  func                          : <function mds at 0x7f60fba4e140>

[ceph_deploy.cli][INFO  ]  ceph_conf                     : None

[ceph_deploy.cli][INFO  ]  mds                           : [('yypdmaster', 'yypdmaster'), ('yypdnode', 'yypdnode')]

[ceph_deploy.cli][INFO  ]  default_release               : False

[ceph_deploy.mds][DEBUG ] Deploying mds, cluster ceph hosts yypdmaster:yypdmaster yypdnode:yypdnode

[yypdmaster][DEBUG ] connected to host: yypdmaster

[yypdmaster][DEBUG ] detect platform information from remote host

[yypdmaster][DEBUG ] detect machine type

[ceph_deploy.mds][INFO  ] Distro info: Ubuntu 16.04 xenial

[ceph_deploy.mds][DEBUG ] remote host will use systemd

[ceph_deploy.mds][DEBUG ] deploying mds bootstrap to yypdmaster

[yypdmaster][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf

[yypdmaster][DEBUG ] create path if it doesn't exist

[yypdmaster][INFO  ] Running command: ceph --cluster ceph --name client.bootstrap-mds --keyring /var/lib/ceph/bootstrap-mds/ceph.keyring auth get-or-create mds.yypdmaster osd allow rwx mds allow mon allow profile mds -o /var/lib/ceph/mds/ceph-yypdmaster/keyring

[yypdmaster][INFO  ] Running command: systemctl enable ceph-mds@yypdmaster

[yypdmaster][WARNIN] Created symlink from /etc/systemd/system/ceph-mds.target.wants/ceph-mds@yypdmaster.service to /lib/systemd/system/ceph-mds@.service.

[yypdmaster][INFO  ] Running command: systemctl start ceph-mds@yypdmaster

[yypdmaster][INFO  ] Running command: systemctl enable ceph.target

[yypdnode][DEBUG ] connected to host: yypdnode

[yypdnode][DEBUG ] detect platform information from remote host

[yypdnode][DEBUG ] detect machine type

[ceph_deploy.mds][INFO  ] Distro info: Ubuntu 16.04 xenial

[ceph_deploy.mds][DEBUG ] remote host will use systemd

[ceph_deploy.mds][DEBUG ] deploying mds bootstrap to yypdnode

[yypdnode][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf

[yypdnode][DEBUG ] create path if it doesn't exist

[yypdnode][INFO  ] Running command: ceph --cluster ceph --name client.bootstrap-mds --keyring /var/lib/ceph/bootstrap-mds/ceph.keyring auth get-or-create mds.yypdnode osd allow rwx mds allow mon allow profile mds -o /var/lib/ceph/mds/ceph-yypdnode/keyring

[yypdnode][INFO  ] Running command: systemctl enable ceph-mds@yypdnode

[yypdnode][WARNIN] Created symlink from /etc/systemd/system/ceph-mds.target.wants/ceph-mds@yypdnode.service to /lib/systemd/system/ceph-mds@.service.

[yypdnode][INFO  ] Running command: systemctl start ceph-mds@yypdnode

[yypdnode][INFO  ] Running command: systemctl enable ceph.target


非常顺利。安装后,可以在任意一个node上看到mds在运行:


# ps -ef|grep ceph

ceph      7967     1  0 17:23 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph

ceph     15674     1  0 17:32 ?        00:00:00 /usr/bin/ceph-mon -f --cluster ceph --id yypdnode --setuser ceph --setgroup ceph

ceph     18019     1  0 17:35 ?        00:00:00 /usr/bin/ceph-mds -f --cluster ceph --id yypdnode --setuser ceph --setgroup ceph


mds是存储cephfs的元信息的,我的ceph是10.2.7版本:


# ceph -v

ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)


虽然支持多 active mds并行运行,但官方文档建议保持一个active mds,其他mds作为standby(见下面ceph集群信息中的fsmap部分):


# ceph -s

    cluster ffac3489-d678-4caf-ada2-3dd0743158b6

    ... ...

      fsmap e6: 1/1/1 up {0=yypdnode=up:active}, 1 up:standby

     osdmap e19: 2 osds: 2 up, 2 in

            flags sortbitwise,require_jewel_osds

      pgmap v192498: 576 pgs, 5 pools, 126 MB data, 238 objects

            44365 MB used, 31881 MB / 80374 MB avail

                 576 active+clean


标签:INFO,yypdnode,K8S,deploy,ceph,mds,yypdmaster,CephFS,亲测
来源: https://blog.51cto.com/15077561/2584792