【ceph相关】osd异常问题处理(lvm信息丢失)
作者:互联网
一、前言
1、简述
参考文档:
RHEL / CentOS : How to rebuild LVM from Archive (metadata backups)
Red Hat Enterprise Linux 7 逻辑卷管理器管理
Bluestore 下的 OSD 开机自启动分析
本文介绍osd异常排查及相关修复过程,主要涉及lvm修复及osd恢复启动两部分说明
2、问题说明
- 查看集群状态,osd.1处于down状态
root@node163:~# ceph -s
cluster:
id: 9bc47ff2-5323-4964-9e37-45af2f750918
health: HEALTH_WARN
too many PGs per OSD (256 > max 250)
services:
mon: 3 daemons, quorum node163,node164,node165
mgr: node163(active), standbys: node164, node165
mds: ceph-1/1/1 up {0=node165=up:active}, 2 up:standby
osd: 3 osds: 2 up, 2 in
data:
pools: 3 pools, 256 pgs
objects: 46 objects, 100MiB
usage: 2.20GiB used, 198GiB / 200GiB avail
pgs: 256 active+clean
root@node163:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.29306 root default
-5 0.09769 host node163
1 hdd 0.09769 osd.1 down 0 1.00000
-3 0.09769 host node164
0 hdd 0.09769 osd.0 up 1.00000 1.00000
-7 0.09769 host node165
2 hdd 0.09769 osd.2 up 1.00000 1.00000
- 查看osd.1所在node163节点,磁盘lvm信息丢失,osd处于未挂载状态
root@node163:~# lvs
root@node163:~# vgs
root@node163:~# pvs
root@node163:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 100G 0 disk
vda 254:0 0 100G 0 disk
├─vda1 254:1 0 487M 0 part /boot
├─vda2 254:2 0 54.4G 0 part /
├─vda3 254:3 0 1K 0 part
├─vda5 254:5 0 39.5G 0 part /data
├─vda6 254:6 0 5.6G 0 part [SWAP]
└─vda7 254:7 0 105M 0 part /boot/efi
root@node163:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 2.0G 0 2.0G 0% /dev
tmpfs 394M 47M 347M 12% /run
/dev/vda2 54G 12G 40G 23% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/vda1 464M 178M 258M 41% /boot
/dev/vda5 39G 48M 37G 1% /data
/dev/vda7 105M 550K 105M 1% /boot/efi
tmpfs 394M 0 394M 0% /run/user/0
root@node163:~# dd if=/dev/sda bs=512 count=4 | hexdump -C
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
4+0 records in
4+0 records out
2048 bytes (2.0 kB, 2.0 KiB) copied, 0.000944624 s, 2.2 MB/s
00000800
二、处理过程
由以上信息可知,磁盘lvm信息丢失,磁盘未挂载,导致osd启动失败
此处我们尝试进行lvm修复和osd恢复启动两部分工作
1、lvm修复
1.1、简述
lvm配置目录结构如下,每当有vg或者lv有配置变更时,lvm都会创建元数据的备份和存档
/etc/lvm/ lvm配置主目录
/etc/lvm/archive lvm元数据备份(一般存放的文件为完整的lvm配置信息)
/etc/lvm/backup lvm元数据存档(一般存放的文件为每个lvm阶段性操作记录,比如说vgcreate、lvcreate、lvchange等)
/etc/lvm/lvm.conf lvm主配置文件,涉及到元数据的备份和存档
可以通过vgcfgrestore --list <vg_name>
查询元数据存档信息,根据操作记录找到最完整的lvm配置信息,可通过此命令找到最新的archive文件用于lvm误删恢复
[Unauthorized System] root@node163:/etc/lvm/archive# vgcfgrestore --list ceph-07e80157-b488-41e5-b217-4079d52edb08
File: /etc/lvm/archive/ceph-07e80157-b488-41e5-b217-4079d52edb08_00000-999427028.vg
Couldn't find device with uuid UjxquH-iHJe-NY1A-BdQf-00oD-j22C-heeOTN.
VG name: ceph-07e80157-b488-41e5-b217-4079d52edb08
Description: Created *before* executing '/sbin/vgcreate --force --yes ceph-07e80157-b488-41e5-b217-4079d52edb08 /dev/sda'
Backup Time: Wed Jun 29 14:53:47 2022
File: /etc/lvm/archive/ceph-07e80157-b488-41e5-b217-4079d52edb08_00001-98007334.vg
VG name: ceph-07e80157-b488-41e5-b217-4079d52edb08
Description: Created *before* executing '/sbin/lvcreate --yes -l 100%FREE -n osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d ceph-07e80157-b488-41e5-b217-4079d52edb08'
Backup Time: Wed Jun 29 14:53:47 2022
File: /etc/lvm/archive/ceph-07e80157-b488-41e5-b217-4079d52edb08_00002-65392131.vg
VG name: ceph-07e80157-b488-41e5-b217-4079d52edb08
Description: Created *before* executing '/sbin/lvchange --addtag ceph.type=block /dev/ceph-07e80157-b488-41e5-b217-4079d52edb08/osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d'
Backup Time: Wed Jun 29 14:53:47 2022
File: /etc/lvm/archive/ceph-07e80157-b488-41e5-b217-4079d52edb08_00003-1190179092.vg
VG name: ceph-07e80157-b488-41e5-b217-4079d52edb08
Description: Created *before* executing '/sbin/lvchange --addtag ceph.block_device=/dev/ceph-07e80157-b488-41e5-b217-4079d52edb08/osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d /dev/ceph-07e80157-b488-41e5-b217-4079d52edb08/osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d'
Backup Time: Wed Jun 29 14:53:47 2022
File: /etc/lvm/archive/ceph-07e80157-b488-41e5-b217-4079d52edb08_00004-1217184452.vg
VG name: ceph-07e80157-b488-41e5-b217-4079d52edb08
Description: Created *before* executing '/sbin/lvchange --addtag ceph.vdo=0 /dev/ceph-07e80157-b488-41e5-b217-4079d52edb08/osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d'
Backup Time: Wed Jun 29 14:53:48 2022
File: /etc/lvm/archive/ceph-07e80157-b488-41e5-b217-4079d52edb08_00005-2051164187.vg
VG name: ceph-07e80157-b488-41e5-b217-4079d52edb08
Description: Created *before* executing '/sbin/lvchange --addtag ceph.osd_id=1 /dev/ceph-07e80157-b488-41e5-b217-4079d52edb08/osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d'
Backup Time: Wed Jun 29 14:53:48 2022
默认情况下,使用pvcreate创建pv,会在物理磁盘第二个512 bytes扇区存放物理卷标签,物理卷标签以字符串LABELONE
开头
可通过dd if=<pv_disk_path> bs=512 count=2
查询pv设备是否正常
注:物理卷标签一般包括物理卷UUID、块设备大小等信息
--异常节点信息--
root@node163:~# dd if=/dev/sda bs=512 count=2 | hexdump -C
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
2+0 records in
2+0 records out
1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000803544 s, 1.3 MB/s
00000400
--正常节点信息--
root@node164:/etc/lvm/archive# dd if=/dev/sda bs=512 count=2 | hexdump -C
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
2+0 records in
2+0 records out
1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000111721 s, 9.2 MB/s
00000200 4c 41 42 45 4c 4f 4e 45 01 00 00 00 00 00 00 00 |LABELONE........|
00000210 1c 9f f4 1e 20 00 00 00 4c 56 4d 32 20 30 30 31 |.... ...LVM2 001|
00000220 59 6c 6a 79 78 64 59 53 66 4e 44 54 4b 7a 36 64 |YljyxdYSfNDTKz6d|
00000230 41 31 44 56 46 79 52 78 5a 52 39 58 61 49 45 52 |A1DVFyRxZR9XaIER|
00000240 00 00 00 00 19 00 00 00 00 00 10 00 00 00 00 00 |................|
00000250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000260 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................|
00000270 00 f0 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000280 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000400
此外,如果物理磁盘未被覆盖写入新数据,可以通过dd if=<pv_disk_path> count=12 | strings
查询lvm相关配置信息
root@node163:~# dd if=/dev/sda count=12 | strings
LVM2 x[5A%r0N*>
ceph-07e80157-b488-41e5-b217-4079d52edb08 {
id = "e1Ge2Y-6DAn-EZzA-6btK-MGMW-qVrP-ldcE9R"
seqno = 1
format = "lvm2"
status = ["RESIZEABLE", "READ", "WRITE"]
flags = []
extent_size = 8192
max_lv = 0
max_pv = 0
metadata_copies = 0
physical_volumes {
pv0 {
id = "UjxquH-iHJe-NY1A-BdQf-00oD-j22C-heeOTN"
device = "/dev/sda"
status = ["ALLOCATABLE"]
flags = []
dev_size = 209715200
pe_start = 2048
pe_count = 25599
# Generated by LVM2 version 2.02.133(2) (2015-10-30): Wed Jun 29 14:53:47 2022
contents = "Text Format Volume Group"
version = 1
description = ""
creation_host = "node163" # Linux node163 4.4.58-20180615.kylin.server.YUN+-generic #kylin SMP Tue Jul 10 14:55:31 CST 2018 aarch64
creation_time = 1656485627 # Wed Jun 29 14:53:47 2022
ceph-07e80157-b488-41e5-b217-4079d52edb08 {
id = "e1Ge2Y-6DAn-EZzA-6btK-MGMW-qVrP-ldcE9R"
seqno = 2
format = "lvm2"
status = ["RESIZEABLE", "READ", "WRITE"]
flags = []
extent_size = 8192
max_lv = 0
max_pv = 0
metadata_copies = 0
physical_volumes {
pv0 {
id = "UjxquH-iHJe-NY1A-BdQf-00oD-j22C-heeOTN"
device = "/dev/sda"
status = ["ALLOCATABLE"]
flags = []
dev_size = 209715200
pe_start = 2048
pe_count = 25599
logical_volumes {
osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d {
12+0 records in
12+0 records out
id = "oV0BZG-WLSM-v2jL-god
1.2、获取lvm信息
在进行lvm修复之前,需要先拿到lv(一般为osd-block-<osd_fsid>
)和vg(一般以ceph-
开头)信息
注:osd_fsid
可通过ceph osd dump | grep <osd_id> | awk '{print $NF}'
查询
- 进入
/etc/lvm/archive
目录,通过grep `ceph osd dump | grep <osd_id> | awk '{print $NF}'` -R *
查询osd的lv和vg信息
# 查询osd.1对应lv为osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d,vg为ceph-07e80157-b488-41e5-b217-4079d52edb08
root@node163:/etc/lvm/archive# grep `ceph osd dump | grep osd.1 | awk '{print $NF}'` -R *
ceph-07e80157-b488-41e5-b217-4079d52edb08_00001-98007334.vg:description = "Created *before* executing '/sbin/lvcreate --yes -l 100%FREE -n osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d ceph-07e80157-b488-41e5-b217-4079d52edb08'"
- 通过
vgcfgrestore --list <vg_name>
查询元数据存档信息,根据操作记录找到最完整的archive文件(包含所有lvm配置信息)
# 通过查询vg元数据操作记录及比对archive文件大小,找到最完整的archive文件为/etc/lvm/archive/ceph-07e80157-b488-41e5-b217-4079d52edb08_00016-18371198.vg,查看pv uuid为UjxquH-iHJe-NY1A-BdQf-00oD-j22C-heeOTN
root@node163:/etc/lvm/archive# vgcfgrestore --list ceph-07e80157-b488-41e5-b217-4079d52edb08
File: /etc/lvm/archive/ceph-07e80157-b488-41e5-b217-4079d52edb08_00016-18371198.vg
VG name: ceph-07e80157-b488-41e5-b217-4079d52edb08
Description: Created *before* executing '/sbin/lvchange --addtag ceph.block_uuid=oV0BZG-WLSM-v2jL-godE-o6vd-fdfu-w7Ms5w /dev/ceph-07e80157-b488-41e5-b217-4079d52edb08/osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d'
Backup Time: Wed Jun 29 14:53:48 2022
root@node163:/etc/lvm/archive# cat ceph-07e80157-b488-41e5-b217-4079d52edb08_00016-18371198.vg | grep -A 5 physical_volumes
physical_volumes {
pv0 {
id = "UjxquH-iHJe-NY1A-BdQf-00oD-j22C-heeOTN"
device = "/dev/sda" # Hint only
1.3、构造label信息
根据一开始查询的信息得知,osd.1对应物理磁盘pv相关信息已丢失,故无法直接使用vgcfgrestore
命令恢复vg配置
此处需要用一个新的硬盘,使用原有的pv-uuid和archive文件创建一个新的pv,将新硬盘前两个扇区信息dd写入到原有的osd.1对应物理磁盘,恢复原有pv信息
root@node163:/etc/lvm/archive# vgcfgrestore -f ceph-07e80157-b488-41e5-b217-4079d52edb08_00016-18371198.vg ceph-07e80157-b488-41e5-b217-4079d52edb08
Couldn't find device with uuid UjxquH-iHJe-NY1A-BdQf-00oD-j22C-heeOTN.
PV unknown device missing from cache
Format-specific setup for unknown device failed
Restore failed.
- 将3.1.2获取的archive文件拷贝到新的节点,使用
pvcreate -ff --uuid <pv_uuid> --restorefile <archive_file> <pv_disk_path>
创建一个相同的pv
[root@node122 ~]# pvcreate -ff --uuid UjxquH-iHJe-NY1A-BdQf-00oD-j22C-heeOTN --restorefile ceph-07e80157-b488-41e5-b217-4079d52edb08_00016-18371198.vg /dev/sdb
Couldn't find device with uuid UjxquH-iHJe-NY1A-BdQf-00oD-j22C-heeOTN.
Physical volume "/dev/sdb" successfully created.
- 使用
dd if=<pv_disk_path> of=<label_file> bs=512 count=2
将新pv的label信息写入到一个新文件file_label
[root@node122 ~]# dd if=/dev/sdb of=file_label bs=512 count=2
2+0 records in
2+0 records out
1024 bytes (1.0 kB) copied, 0.219809 s, 4.7 kB/s
[root@node122 ~]# dd if=./file_label | hexdump -C
2+0 records in
2+0 records out
1024 bytes (1.0 kB) copied00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000200 4c 41 42 45 4c 4f 4e 45 01 00 00 00 00 00 00 00 |LABELONE........|
, 6.1274e-05 s, 16.7 MB/s
00000210 2b a3 c4 46 20 00 00 00 4c 56 4d 32 20 30 30 31 |+..F ...LVM2 001|
00000220 55 6a 78 71 75 48 69 48 4a 65 4e 59 31 41 42 64 |UjxquHiHJeNY1ABd|
00000230 51 66 30 30 6f 44 6a 32 32 43 68 65 65 4f 54 4e |Qf00oDj22CheeOTN|
00000240 00 00 00 00 19 00 00 00 00 00 10 00 00 00 00 00 |................|
00000250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000260 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................|
00000270 00 f0 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000280 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 |................|
00000290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000400
1.4、恢复pv信息
- 操作前,使用
dd if=<pv_disk_path> of=/home/file_backup bs=512 count=2
将osd.1对应物理磁盘前1024字节信息备份到本地
root@node163:/etc/lvm/archive# dd if=/dev/sda bs=512 count=2 | hexdump -C
2+0 records in
2+0 records out
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000400
1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000761583 s, 1.3 MB/s
root@node163:/etc/lvm/archive# dd if=/dev/sda of=/home/file_backup bs=512 count=2
2+0 records in
2+0 records out
1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000825143 s, 1.2 MB/s
- 使用
dd if=<label_file> of=<pv_disk_path> bs=512 count=2
,将3.1.3构造的label信息写入到osd.1对应物理磁盘
root@node163:/etc/lvm/archive# dd if=/home/file_label of=/dev/sda bs=512 count=2
2+0 records in
2+0 records out
1024 bytes (1.0 kB, 1.0 KiB) copied, 0.00122898 s, 833 kB/s
root@node163:/etc/lvm/archive# dd if=/dev/sda bs=512 count=2 | hexdump -C
2+0 records in
2+0 records out
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000200 4c 41 42 45 4c 4f 4e 45 01 00 00 00 00 00 00 00 |LABELONE........|
00000210 2b a3 c4 46 20 00 00 00 4c 56 4d 32 20 30 30 31 |+..F ...LVM2 001|
00000220 55 6a 78 71 75 48 69 48 4a 65 4e 59 31 41 42 64 |UjxquHiHJeNY1ABd|
00000230 51 66 30 30 6f 44 6a 32 32 43 68 65 65 4f 54 4e |Qf00oDj22CheeOTN|
00000240 00 00 00 00 19 00 00 00 00 00 10 00 00 00 00 00 |................|
00000250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000260 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................|
00000270 00 f0 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000280 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 |................|
00000290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000400
1024 bytes (1.0 kB, 1.0 KiB) copied, 0.00244905 s, 418 kB/s
- 使用
pvcreate -ff --uuid <pv_uuid> --restorefile <archive_file> <pv_disk_path>
,指定pv uuid使用osd.1对应物理磁盘创建pv
root@node163:/etc/lvm/archive# pvcreate -ff --uuid UjxquH-iHJe-NY1A-BdQf-00oD-j22C-heeOTN --restorefile ceph-07e80157-b488-41e5-b217-4079d52edb08_00016-18371198.vg /dev/sda
Couldn't find device with uuid UjxquH-iHJe-NY1A-BdQf-00oD-j22C-heeOTN.
Physical volume "/dev/sda" successfully created
root@node163:/etc/lvm/archive# pvs
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 100.00g 100.00g
1.5、恢复vg/lv信息
- 使用
vgcfgrestore -f <archive_file> <vg_name>
恢复vg和lv信息,此时lv、vg、pv均已恢复正常
root@node163:/etc/lvm/archive# vgcfgrestore -f ceph-07e80157-b488-41e5-b217-4079d52edb08_00016-18371198.vg ceph-07e80157-b488-41e5-b217-4079d52edb08
Restored volume group ceph-07e80157-b488-41e5-b217-4079d52edb08
root@node163:/etc/lvm/archive# vgs
VG #PV #LV #SN Attr VSize VFree
ceph-07e80157-b488-41e5-b217-4079d52edb08 1 1 0 wz--n- 100.00g 0
root@node163:/etc/lvm/archive# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d ceph-07e80157-b488-41e5-b217-4079d52edb08 -wi------- 100.00g
root@node163:~# ll /dev/mapper/
total 0
drwxr-xr-x 2 root root 80 Jul 1 17:28 ./
drwxr-xr-x 19 root root 4520 Jul 1 17:28 ../
lrwxrwxrwx 1 root root 7 Jul 1 17:33 ceph--07e80157--b488--41e5--b217--4079d52edb08-osd--block--8cd1658a--97d7--42d6--8f67--6a076c6fb42d -> ../dm-0
crw------- 1 root root 10, 236 Jul 1 17:28 control
- 查看lv信息,正常应该有以
bluestore block
前缀的block信息
注:如恢复lvm成功之后,lv没有bluestore block
前缀信息,则可能存在磁盘被覆盖写入的情况,导致block信息被破坏,此时osd数据已丢失,无法进行下一步恢复
root@node163:~# dd if=/dev/ceph-07e80157-b488-41e5-b217-4079d52edb08/osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d bs=512 count=2 | hexdump -C
2+0 records in
2+0 records out
00000000 62 6c 75 65 73 74 6f 72 65 20 62 6c 6f 63 6b 20 |bluestore block |
00000010 64 65 76 69 63 65 0a 38 63 64 31 36 35 38 61 2d |device.8cd1658a-|
00000020 39 37 64 37 2d 34 32 64 36 2d 38 66 36 37 2d 36 |97d7-42d6-8f67-6|
00000030 61 30 37 36 63 36 66 62 34 32 64 0a 02 01 16 01 |a076c6fb42d.....|
00000040 00 00 8c d1 65 8a 97 d7 42 d6 8f 67 6a 07 6c 6f |....e...B..gj.lo|
00000050 b4 2d 00 00 c0 ff 18 00 00 00 fd f6 bb 62 ac 78 |.-...........b.x|
00000060 dc 18 04 00 00 00 6d 61 69 6e 08 00 00 00 06 00 |......main......|
00000070 00 00 62 6c 75 65 66 73 01 00 00 00 31 09 00 00 |..bluefs....1...|
00000080 00 63 65 70 68 5f 66 73 69 64 24 00 00 00 39 62 |.ceph_fsid$...9b|
00000090 63 34 37 66 66 32 2d 35 33 32 33 2d 34 39 36 34 |c47ff2-5323-4964|
000000a0 2d 39 65 33 37 2d 34 35 61 66 32 66 37 35 30 39 |-9e37-45af2f7509|
000000b0 31 38 0a 00 00 00 6b 76 5f 62 61 63 6b 65 6e 64 |18....kv_backend|
000000c0 07 00 00 00 72 6f 63 6b 73 64 62 05 00 00 00 6d |....rocksdb....m|
000000d0 61 67 69 63 14 00 00 00 63 65 70 68 20 6f 73 64 |agic....ceph osd|
000000e0 20 76 6f 6c 75 6d 65 20 76 30 32 36 09 00 00 00 | volume v026....|
000000f0 6d 6b 66 73 5f 64 6f 6e 65 03 00 00 00 79 65 73 |mkfs_done....yes|
00000100 07 00 00 00 6f 73 64 5f 6b 65 79 28 00 00 00 41 |....osd_key(...A|
00000110 51 44 35 39 72 74 69 41 62 65 2f 4c 52 41 41 65 |QD59rtiAbe/LRAAe|
00000120 6a 4b 6e 42 6d 56 4e 6a 4a 75 37 4e 78 37 79 37 |jKnBmVNjJu7Nx7y7|
00000130 58 38 57 55 41 3d 3d 05 00 00 00 72 65 61 64 79 |X8WUA==....ready|
00000140 05 00 00 00 72 65 61 64 79 06 00 00 00 77 68 6f |....ready....who|
00000150 61 6d 69 01 00 00 00 31 7e 77 c5 2d 00 00 00 00 |ami....1~w.-....|
00000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000400
1024 bytes (1.0 kB, 1.0 KiB) copied, 0.00132415 s, 773 kB/s
2、osd恢复启动
ceph osd挂载由ceph-volume控制,当lvm修复成功之后,可以执行systemctl start ceph-volume@lvm-<osd_id>-`ceph osd dump | grep <osd_id> | awk '{print $NF'}`
,启动lvm相关挂载和osd启动
root@node163:~# systemctl start ceph-volume@lvm-1-`ceph osd dump | grep osd.1 | awk '{print $NF'}`
root@node163:~# systemctl status ceph-volume@lvm-1-`ceph osd dump | grep osd.1 | awk '{print $NF'}`
● ceph-volume@lvm-1-8cd1658a-97d7-42d6-8f67-6a076c6fb42d.service - Ceph Volume activation: lvm-1-8cd1658a-97d7-42d6-8f67-6a076c6fb42d
Loaded: loaded (/lib/systemd/system/ceph-volume@.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Fri 2022-07-01 17:54:49 CST; 4s ago
Main PID: 55683 (code=exited, status=0/SUCCESS)
Jul 01 17:54:48 node163 systemd[1]: Starting Ceph Volume activation: lvm-1-8cd1658a-97d7-42d6-8f67-6a076c6fb42d...
Jul 01 17:54:49 node163 sh[55683]: Running command: ceph-volume lvm trigger 1-8cd1658a-97d7-42d6-8f67-6a076c6fb42d
Jul 01 17:54:49 node163 systemd[1]: Started Ceph Volume activation: lvm-1-8cd1658a-97d7-42d6-8f67-6a076c6fb42d.
root@node163:~# ceph osd in osd.1
marked in osd.1.
root@node163:~# ceph -s
cluster:
id: 9bc47ff2-5323-4964-9e37-45af2f750918
health: HEALTH_OK
services:
mon: 3 daemons, quorum node163,node164,node165
mgr: node163(active), standbys: node164, node165
mds: ceph-1/1/1 up {0=node165=up:active}, 2 up:standby
osd: 3 osds: 3 up, 3 in
data:
pools: 3 pools, 256 pgs
objects: 46 objects, 100MiB
usage: 3.21GiB used, 297GiB / 300GiB avail
pgs: 256 active+clean
注:
如执行上面命令仍无法拉起osd,可执行ceph-volume lvm trigger <osd_id>-<osd_fs_id>
查看详细执行步骤,进一步排查定位具体阻塞位置
root@node163:~# ceph-volume lvm trigger 1-8cd1658a-97d7-42d6-8f67-6a076c6fb42d
Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1
Running command: restorecon /var/lib/ceph/osd/ceph-1
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-07e80157-b488-41e5-b217-4079d52edb08/osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d --path /var/lib/ceph/osd/ceph-1
Running command: ln -snf /dev/ceph-07e80157-b488-41e5-b217-4079d52edb08/osd-block-8cd1658a-97d7-42d6-8f67-6a076c6fb42d /var/lib/ceph/osd/ceph-1/block
Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block
Running command: chown -R ceph:ceph /dev/dm-0
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1
Running command: systemctl enable ceph-volume@lvm-1-8cd1658a-97d7-42d6-8f67-6a076c6fb42d
Running command: systemctl enable --runtime ceph-osd@1
Running command: systemctl start ceph-osd@1
--> ceph-volume lvm activate successful for osd ID: 1
标签:00,ceph,07e80157,41e5,lvm,osd 来源: https://www.cnblogs.com/luxf0/p/16435630.html