转载:番外篇 etcd服务无法启动的修复方法
作者:互联网
今天有一个环境的master节点的挂载掉线了,恢复之后该节点的etcd就起不来了。
猜测应该是和其他etcd节点数据不同步导致的,下面我们模拟一下
案例
查看集群组件状态
[root@k8s-master01 ~]# kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
我们登陆节点(192.168.1.20),删除etcd的数据目录模仿故障
根据配置得知我们的数据目录位置
[Member]
ETCD_NAME="etcd-1"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd" #数据目录
ETCD_LISTEN_PEER_URLS="https://192.168.1.20:2380"
ETCD_LISTEN_CLIENT_URLS="https://192.168.1.20:2379"
切换目录
cd /var/lib/etcd/default.etcd
清除数据(或者备份)
rm -rf *
重启服务,查看状态
systemctl restart etcd
systemctl status etcd
解决方法
如果是跟着我之前的部署的那么你可能没有把etcdctl命令放在全局,需要多一步操作
这个命令之前没有放进去,这里添加以下
cp /opt/etcd/bin/etcdctl /usr/bin/
查看etcd集群状态
找一台存活的etcd节点去访问
etcdctl
--cacert=/opt/etcd/ssl/ca.pem
--cert=/opt/etcd/ssl/server.pem
--key=/opt/etcd/ssl/server-key.pem
--endpoints='https://192.168.1.21:2379'
member list
参数说明
--cacert=/opt/etcd/ssl/ca.pem
--cert=/opt/etcd/ssl/server.pem
--key=/opt/etcd/ssl/server-key.pem \ #以上证书+私钥
--endpoints='https://192.168.1.21:2379' #指定一台存活的etcd服务
返回
22cb69b2fd1bb417, started, etcd-2, https://192.168.1.21:2380, https://192.168.1.21:2379, false
3c3bd4fd7d7e553e, started, etcd-3, https://192.168.1.22:2380, https://192.168.1.22:2379, false
5a224bcd35cc7d02, started, etcd-1, https://192.168.1.20:2380, https://192.168.1.20:2379, false
将无法启动服务的节点踢出
etcdctl
--cacert=/opt/etcd/ssl/ca.pem
--cert=/opt/etcd/ssl/server.pem
--key=/opt/etcd/ssl/server-key.pem
--endpoints='https://192.168.1.21:2379'
member remove 5a224bcd35cc7d02
删除自己对应节点上的id
查看已经被踢出
etcdctl
--cacert=/opt/etcd/ssl/ca.pem
--cert=/opt/etcd/ssl/server.pem
--key=/opt/etcd/ssl/server-key.pem
--endpoints='https://192.168.1.21:2379' member list
返回
22cb69b2fd1bb417, started, etcd-2, https://192.168.1.21:2380, https://192.168.1.21:2379, false
3c3bd4fd7d7e553e, started, etcd-3, https://192.168.1.22:2380, https://192.168.1.22:2379, false
可以看到只有2条了
重新添加该节点
etcdctl
--cacert=/opt/etcd/ssl/ca.pem
--cert=/opt/etcd/ssl/server.pem
--key=/opt/etcd/ssl/server-key.pem
--endpoints='https://192.168.1.21:2379'
member add etcd-1 --peer-urls=https://192.168.1.20:2380
这里add后面是etcd节点的名称,必须和配置文件中的名称相同
因为是重新加入节点,ip不变,所以证书不需要重新生成
返回
ETCD_NAME="etcd-1"
ETCD_INITIAL_CLUSTER="etcd-2=https://192.168.1.21:2380,etcd-3=https://192.168.1.22:2380,etcd-1=https://192.168.1.20:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.1.20:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
查看状态
etcdctl
--cacert=/opt/etcd/ssl/ca.pem
--cert=/opt/etcd/ssl/server.pem
--key=/opt/etcd/ssl/server-key.pem
--endpoints='https://192.168.1.22:2379' member list
返回
22cb69b2fd1bb417, started, etcd-2, https://192.168.1.21:2380, https://192.168.1.21:2379, false
3c3bd4fd7d7e553e, started, etcd-3, https://192.168.1.22:2380, https://192.168.1.22:2379, false
841bd1ec499f60a2, unstarted, , https://192.168.1.20:2380, , false
这里还没有启动服务,没有准备好
重启etcd (无法启动etcd的节点)
vim /opt/etcd/cfg/etcd.conf
查看
[Member]
ETCD_NAME="etcd-1"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="https://192.168.1.20:2380"
ETCD_LISTEN_CLIENT_URLS="https://192.168.1.20:2379"
[Clustering]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.1.20:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://192.168.1.20:2379"
ETCD_INITIAL_CLUSTER="etcd-1=https://192.168.1.20:2380,etcd-2=https://192.168.1.21:2380,etcd-3=https://192.168.1.22:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new" #修改这里 为existing
启动服务
systemctl restart etcd
查看集群状态
etcdctl
--cacert=/opt/etcd/ssl/ca.pem
--cert=/opt/etcd/ssl/server.pem
--key=/opt/etcd/ssl/server-key.pem
--endpoints='https://192.168.1.22:2379' member list
返回
22cb69b2fd1bb417, started, etcd-2, https://192.168.1.21:2380, https://192.168.1.21:2379, false
3c3bd4fd7d7e553e, started, etcd-3, https://192.168.1.22:2380, https://192.168.1.22:2379, false
841bd1ec499f60a2, started, etcd-1, https://192.168.1.20:2380, https://192.168.1.20:2379, false
查看组件状态
[root@k8s-master01 cfg]# kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
————————————————
版权声明:本文为CSDN博主「默子昂」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/qq_42883074/article/details/112789206
标签:opt,etcd,--,192.168,2380,https,转载,番外篇 来源: https://www.cnblogs.com/vmsky/p/16323234.html