Maxwell “Couldn‘t find database mysql“问题排查
作者:互联网
一、issue描述
最近,prod 环境一个mysql的binlog数据抽取maxwell进程被prometheus监控到,其运行出现问题,监控显示,该进程频繁的挂掉。
由于maxwell实例服务做了systemct服务管控,会在程序挂掉20秒后自动将其拉起。
究其原因,猜测可能maxwell进程因故障启动不起来,又被systemct服务拉起,周而复始,被prometheus监控为频繁重启。
通过查看服务日志,看到如下报错:
[root@server-xx system]# journalctl -xef
20:11:39,959 INFO TaskManager - Stopping: com.zendesk.maxwell.schema.PositionStoreThread@6a827077
20:11:39,959 INFO StoppableTaskState - com.zendesk.maxwell.schema.PositionStoreThread requestStop() called (in state: RUNNING)
20:11:39,959 INFO TaskManager - Stopping: com.zendesk.maxwell.producer.MaxwellKafkaProducerWorker@565055db
20:11:39,959 INFO StoppableTaskState - MaxwellKafkaProducerWorker requestStop() called (in state: RUNNING)
20:11:39,960 INFO KafkaProducer - [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
20:11:39,964 INFO TaskManager - Stopping: com.zendesk.maxwell.bootstrap.BootstrapController@d021375
20:11:39,964 INFO StoppableTaskState - com.zendesk.maxwell.bootstrap.BootstrapController requestStop() called (in state: RUNNING)
20:11:39,964 INFO TaskManager - Stopping: com.zendesk.maxwell.replication.BinlogConnectorReplicator@368831cf
20:11:39,964 INFO StoppableTaskState - com.zendesk.maxwell.replication.BinlogConnectorReplicator requestStop() called (in state: STOPPED)
java.lang.RuntimeException: Couldn't find database mysql
at com.zendesk.maxwell.replication.TableCache.processEvent(TableCache.java:28)
at com.zendesk.maxwell.replication.BinlogConnectorReplicator.getTransactionRows(BinlogConnectorReplicator.java:486)
at com.zendesk.maxwell.replication.BinlogConnectorReplicator.getRow(BinlogConnectorReplicator.java:592)
at com.zendesk.maxwell.replication.BinlogConnectorReplicator.work(BinlogConnectorReplicator.java:175)
at com.zendesk.maxwell.util.RunLoopProcess.runLoop(RunLoopProcess.java:34)
at com.zendesk.maxwell.Maxwell.startInner(Maxwell.java:222)
at com.zendesk.maxwell.Maxwell.start(Maxwell.java:156)
at com.zendesk.maxwell.Maxwell.main(Maxwell.java:243)
二、maxwell配置相关
1、maxwell配置文件
my_biz_db1_2_kafka.properties
producer=kafka
host=${MAXWELL_META_DB}
user=maxwell
password=${PASSWORD}
port=3306
client_id=my_biz_1
replica_server_id=1307
replication_host=${BIZ_1_DB}
replication_user=david_test
replication_password=${PASSWORD}
replication_port=3307
jdbc_options=useSSL=false&serverTimezone=Asia/Shanghai
#### 设置白名单过滤方式一
#filter=exclude: *.*, include: saas_v1.*
#### 设置白名单过滤方式二
exclude_dbs=*
include_dbs=biz_db_1,biz_db_2
include_tables=db_1_table_1,db_1_table_2,db_2_table_1,db_2_table_2,db_2_table_3
kafka.bootstrap.servers=KFK_SERVER_1:9092,KFK_SERVER_2:9092,KFK_SERVER_3:9092
kafka_topic=${KFK_TOPIC_NAME}
kafka_partition_hash=murmur3
producer_partition_by=primary_key
2、启动命令
/opt/module/maxwell/maxwell-1.22.1/bin/maxwell --config /opt/module/maxwell/maxwell-1.22.1/my_custom_config/my_biz_db1_2_kafka.properties
3、封装在服务里
路径为/etc/systemd/system,服务名为 maxwell_my_biz_db1_2_kafka.service
管理命令:
systemctl ( start | stop | restart | status | enable | is-enabled | disable ) maxwell_biz_1.service
三、排查途径
阶段一、
看到报错,刚开始认为是是不是maxwell 监听 业务mysql binlog的用户bigdata_admin是不是没有mysql 的访问权限,经DBA查询获悉该用户确实无法访问到业务mysql库,于是协商DBA帮忙将 bigdata_admin 赋予 mysql 库的访问权限:
# 因此前开通 bigdata_admin 作为binlog库的抽取账户,密码已设置过,这里执行这行命令即可
grant all privileges on `mysql`.* to 'bigdata_admin'@'%';
#
grant all privileges on `mysql`.* to 'bigdata_admin '@'%' identified by 'xxxx';
设置完毕后,重启启动maxwell进程,问题依旧,进程仍启动不起来。
阶段二
mysql> insert into databases
(schema_id,name,charset) values(230,‘mysql’,‘utf8’);
insert into tables
(schema_id,database_id,name,charset,pk) values(230,61,‘user’,‘utf8’,‘Host,User’);
阶段三、测试环境中复现该bug
以下sql语句不会触发binlog日志的产生:
REVOKE INDEX ON *.* FROM 'user_test'@'%';
GRANT INDEX ON *.* TO 'user_test'@'%';
CREATE USER 'user_test'@'%' IDENTIFIED BY 'xxxx';
而如果执行如下语句,或再navicat的gui中执行类似的操作:
UPDATE `mysql`.`user` SET `Select_priv`='Y' WHERE (`Host`='%') AND (`User`='user_test') LIMIT 1
将会直接触发maxwell进程挂掉。
issue复现的途径
:
1、账户liuwei_test没有mysql的访问权限;
2、maxwell的databases列表里没有监听mysql库;
3、maxwell配置文件中不监听mysql库的user表。
4、上游对用户做了curd操作(这样会触发修改mysql库的user表)。
特别是:
UPDATEmysql
.user
SETSelect_priv
=‘Y’ WHERE (Host
=’%’) AND (User
=‘user_test’) LIMIT 1
四、排查总结
4.1 对于maxwell新实例
当大数据使用maxwell监听一个新的数据库时,必须严格按如下步骤执行:
1、赋予业务mysql连接用户(如bigdata_admin),可以访问业务库中mysql元数据库的权限。
2、其他按maxwell正常部署流程即可。
3、启动监控该库binlog的maxwell实例,即可在maxwell元数据库中监听到业务库中的mysql元数据库。
4.2 对于maxwell老实例
对于已经启动,但未监听业务库中的mysql元数据库的maxwell进程,需要通过如下方法解决此类问题:
方法一:
1、(可选,非必须)赋予业务mysql连接用户(如bigdata_admin),可以访问业务库中mysql元数据库的权限。;
2、在maxwell元数据库中添加对业务库中mysql元数据库 及 其库下所有表的监控。
方法二:
1、(必须)赋予业务mysql连接用户(如bigdata_admin),可以访问业务库中mysql元数据库的权限。;
2、选择一个合适的时间点(如物业23:45分),重刷maxwell元数据库,具体操作步骤如下:
A).关闭所有的maxwell进程;
systemctl stop maxwell_biz_1.service
systemctl stop maxwell_biz_2.service
systemctl stop maxwell_biz_3.service
B).登录maxwell配置文件中host参数指定的mysql数据库,删除maxwell元数据库;
C).重新启动所有的maxwell实例进程
systemctl start maxwell_biz_1.service
systemctl start maxwell_biz_2.service
systemctl start maxwell_biz_3.service
标签:Couldn,database,Maxwell,zendesk,maxwell,biz,user,mysql,com 来源: https://blog.csdn.net/liuwei0376/article/details/115678573