记一次zabbix-server故障恢复导致的事故 zabbix-server.log -- One child process died
作者:互联网
前言
zabbix-server昨天出了个问题,不停的重启。昨天摆弄到晚上也不搞清楚原因,按照网上说的各种操作,各种CacheSize、TimeOut、StartPollers都改了,还有什么Include的日志也不贴说个丢,,,想着今天一早来处理下,结果出了生产事故。
刚好最近超融合不稳定,凌晨的时候,生产环境有台服务器(注册中心和配置中心)无故重启,然后导致一系列的问题,这个不在这里赘述,来讲一下zabbix这个事吧。
环境
CentOS Linux release 7.6.1810 (Core)
mysql 5.7 # docker启动,数据落盘
zabbix参照官方文档 安装的5.0TLS+CentOS7+Mysql+Nginx版。
zabbix_server (Zabbix) 5.0.5
Revision eaa427cf19 26 October 2020, compilation time: Oct 26 2020 12:20:11
Copyright (C) 2020 Zabbix SIA
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it according to
the license. There is NO WARRANTY, to the extent permitted by law.
This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/).
Compiled with OpenSSL 1.0.2k-fips 26 Jan 2017
Running with OpenSSL 1.0.2k-fips 26 Jan 2017
PS:本人对zabbix了解不多,只是会安照官方和网上的文档安装配置,自己会做一些自定义的监控配置。
问题
zabbix-server不停重启,登录页面也打不开,zabbix-server.log报错如下:
2148:20210603:143421.801 Starting Zabbix Server. Zabbix 5.0.5 (revision eaa427cf19).
2148:20210603:143421.801 ****** Enabled features ******
2148:20210603:143421.801 SNMP monitoring: YES
2148:20210603:143421.801 IPMI monitoring: YES
2148:20210603:143421.801 Web monitoring: YES
2148:20210603:143421.801 VMware monitoring: YES
2148:20210603:143421.801 SMTP authentication: YES
2148:20210603:143421.801 ODBC: YES
2148:20210603:143421.801 SSH support: YES
2148:20210603:143421.801 IPv6 support: YES
2148:20210603:143421.801 TLS support: YES
2148:20210603:143421.801 ******************************
2148:20210603:143421.801 using configuration file: /etc/zabbix/zabbix_server.conf
...
...
2179:20210603:143423.081 ================================
2179:20210603:143423.081 Please consider attaching a disassembly listing to your bug report.
2179:20210603:143423.081 This listing can be produced with, e.g., objdump -DSswx zabbix_server.
2179:20210603:143423.081 ================================
2148:20210603:143423.082 One child process died (PID:2179,exitcode/signal:1). Exiting ...
zabbix_server [2148]: Error waiting for process with PID 2179: [10] No child processes
2148:20210603:143423.088 syncing history data...
2148:20210603:143423.097 syncing history data... 100.000000%
2148:20210603:143423.097 syncing history data done
2148:20210603:143423.097 syncing trend data...
2148:20210603:143423.102 syncing trend data done
2148:20210603:143423.102 Zabbix Server stopped. Zabbix 5.0.5 (revision eaa427cf19).
处理过程
日志里是没有体现出内存、缓存、MySQL等问题,于是网上各种检索。做了各种操作,全套重启、修改CacheSize、查看子进程锁死情况、清数据库。
后面把MySQL直接初始化,发现zabbix-server启动了几分钟,然后又开始无间断重启。登录页也报错 Database error Connection timed out,查看zabbix-server.conf没有问题。然后找官方的安装文档,发现zabbix是front、server分离的。。。emmm这个时候好像找到问题了。
检查前端的配置发现/etc/zabbix/web/zabbix.conf.php下的mysql信息竟然不对???WTF!!!赶紧修改。然后重启
systemctl stop zabbix-server zabbix-agent rh-nginx116-nginx rh-php72-php-fpm
过了几分钟,zabbix-server又开始重启,然后想到网上的一篇文档,修改报警媒介类型里mail的配置-安全链接:改成STARTTLS(纯文本通信协议扩展)。终于恢复了。。。
PS:
使用一些开源软件的时候,还是要多了解一下软件本身的架构,维护起来也会更加得心应手。
特别感谢:
https://blog.csdn.net/liuxiangyang_/article/details/100024641
https://yunwei365.blog.csdn.net/article/details/103677447
https://blog.csdn.net/h106140873/article/details/104311586
标签:log,143421.801,server,zabbix,YES,20210603,2148 来源: https://www.cnblogs.com/xidxi/p/14845733.html