数据库
首页 > 数据库> > MySQL MHA高可用

MySQL MHA高可用

作者:互联网

目录


MHA高可用架构介绍

MHA(Master High Availability)是一款开源的 MySQL 的高可用解决方案,基于perl语言实现,由日本DeNA公司开发。它为 MySQL 主从复制架构提供了 Automating Master Failover 功能。在MySQL故障切换过程中,MHA能做到0-30s之内自动完成数据库的故障切换操作,并且在进行故障切换过程中,MHA能最大程度保证数据库的一致性,以达到真正意义上的高可用。

MHA 高可用方案由两个部分构成:

MHA适用场景:目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群必须最少有3台数据库服务器(一主二从,一台充当Master、一台充当备用Master、一台充当从库)。出于成本考虑,淘宝在此基础上进行了改造,目前淘宝开发的TMHA已经支持一主一从。

MHA高可用架构原理

  1. MHA监控节点 (通过配置文件获取所有节点信息)
    系统、网络、SSH连接性、主从状态(重点是主库)

  2. 如果主库宕机,MHA处理过程如下:
    2.1 选主:
        (1) 如果判断从库(position或者GTID),数据有差异,最接近于Master的slave,成为备选主。
        (2) 如果判断从库(position或者GTID),数据一致,按照配置文件顺序选主。
        (3) 如果设定有权重(candidate_master=1),按照权重强制指定备选主。
             a. 默认情况下如果一个slave落后master 100M的relay logs的话,即使有权重,也会失效。
             b. 如果check_repl_delay=0的话,即使落后很多日志,也强制选择其为备选主。
    2.2 数据补偿
        (1) 当SSH能连接时,从库对比主库GTID 或者position号,立即将二进制日志保存至各个从节点并且应用(save_binary_logs脚本 )
        (2) 当SSH不能连接时,MHA无法保存二进制文件,只能进行故障转移但是丢失最新数据,可以对比从库之间的relaylog的差异(apply_diff_relay_logs脚本)
    2.3 Failover
    将故障节点踢出集群,备选主进行身份切换,并对外提供服务;其余从库和新主库确认新的主从关系。

其他功能:

  1. 应用透明:使用VIP机制实现应用透明
  2. 故障切换通知(send_report)
  3. 二次数据补偿(binlog_server)
  4. 自动修复主库并加入集群(待开发)

MHA高可用环境搭建

MHA基础架构搭建

环境规划

环境规划

环境准备

配置一主双从环境

配置关键程序软链接

由于MHA软件脚本中写的是命令的绝对路径/usr/bin/mysqlbinlog与/usr/bin/mysql,因此为了使用MHA,就要为mysqlbinlog和mysql可执行文件建立软链接。

[root@localhost ~]# ln -s /data/mysql/bin/mysqlbinlog    /usr/bin/mysqlbinlog
[root@localhost ~]# ln -s /data/mysql/bin/mysql          /usr/bin/mysql

配置各节点互信

  1. 各节点生成密钥

    [root@master ~]# ssh-keygen -t rsa
    [root@slave01 ~]# ssh-keygen -t rsa
    [root@slave02 ~]# ssh-keygen -t rsa
    
  2. 将各节点的公钥文件汇总到一个总的认证文件authorized_keys中

    [root@slave01 ~]# scp authorized_keys 192.168.1.5:/root/a1_keys
    [root@slave02 ~]# scp authorized_keys 192.168.1.5:/root/a2_keys
    [root@master ~]# cat /root/a1_keys >> authorized_keys 
    [root@master ~]# cat /root/a2_keys >> authorized_keys 
    
  3. 分发汇总的公钥文件

    [root@master ~]# cp authorized_keys .ssh/
    [root@master ~]# scp authorized_keys 192.168.1.6:/root/.ssh/
    [root@master ~]# scp authorized_keys 192.168.1.7:/root/.ssh/
    
  4. 测试

    [root@master ~]# ssh 192.168.1.5 date
    Mon Oct  4 20:09:48 CST 2021
    [root@master ~]# ssh 192.168.1.6 date
    Mon Oct  4 20:09:51 CST 2021
    [root@master ~]# ssh 192.168.1.7 date
    Mon Oct  4 20:09:53 CST 2021
    
    [root@slave01 ~]# ssh 192.168.1.5 date
    Mon Oct  4 20:10:13 CST 2021
    [root@slave01 ~]# ssh 192.168.1.6 date
    Mon Oct  4 20:10:16 CST 2021
    [root@slave01 ~]# ssh 192.168.1.7 date
    Mon Oct  4 20:10:18 CST 2021
    
    [root@slave02 ~]# ssh 192.168.1.5 date
    Mon Oct  4 20:10:43 CST 2021
    [root@slave02 ~]# ssh 192.168.1.6 date
    Mon Oct  4 20:10:44 CST 2021
    [root@slave02 ~]# ssh 192.168.1.7 date
    Mon Oct  4 20:10:47 CST 2021
    

安装软件

  1. 下载MHA软件

    [root@slave02 ~]# wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
    [root@slave02 ~]# wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58-0.el7.centos.noarch.rpm
    
  2. 各节点安装node依赖包

    [root@slave01 ~]# yum install -y perl-DBD-MySQL
    
  3. 各节点安装node软件

    [root@master ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm 
    [root@slave01 ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm 
    [root@slave02 ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
    
  4. db03节点安装manager依赖包

    [root@slave02 ~]# yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
    
  5. db03节点安装manager软件

    [root@slave02 ~]# rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm 
    
  6. 创建MHA专用监控管理用户(主库)

    db01 [(none)]>grant all privileges on *.* to mha@'192.168.1.%' identified by 'mha';
    

配置文件

状态检查

开启MHA

db03开启MHA

[root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
[1] 3035

查看MHA

[root@slave02 ~]# masterha_check_status --conf=/etc/mha/mysql.cnf
mysql (pid:3035) is running(0:PING_OK), master:192.168.1.5
[root@slave02 ~]# mysql -umha -pmha -h 192.168.1.5 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 5     |
+---------------+-------+
[root@slave02 ~]# mysql -umha -pmha -h 192.168.1.6 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 6     |
+---------------+-------+
[root@slave02 ~]# mysql -umha -pmha -h 192.168.1.7 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 7     |
+---------------+-------

MHA环境故障模拟及修复

故障模拟

模拟master宕机

[root@master ~]# systemctl stop mysqld

查看db02主从信息

db02 [(none)]>show slave status \G 
Empty set (0.00 sec)

db02 [(none)]>show processlist;
+----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host              | db   | Command          | Time | State                                                         | Info             |
+----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
| 35 | root | localhost         | NULL | Query            |    1 | starting                                                      | show processlist |
| 44 | repl | 192.168.1.7:53166 | NULL | Binlog Dump GTID |  150 | Master has sent all binlog to slave; waiting for more updates | NULL             |
+----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
2 rows in set (0.12 sec)

查看MHA manager主机,manager停止

[root@slave02 ~]#  masterha_check_status --conf=/etc/mha/mysql.cnf
mysql is stopped(2:NOT_RUNNING).

查看MHA manager日志信息

[root@slave02 ~]# tail -f /var/log/mha/mysql/manager
...
----- Failover Report -----

mysql: MySQL Master failover 192.168.1.5(192.168.1.5:3306) to 192.168.1.6(192.168.1.6:3306) succeeded

Master 192.168.1.5(192.168.1.5:3306) is down!

Check MHA Manager logs at slave02:/var/log/mha/mysql/manager for details.

Started automated(non-interactive) failover.
Selected 192.168.1.6(192.168.1.6:3306) as a new master.
192.168.1.6(192.168.1.6:3306): OK: Applying all logs succeeded.
192.168.1.7(192.168.1.7:3306): OK: Slave started, replicating from 192.168.1.6(192.168.1.6:3306)
192.168.1.6(192.168.1.6:3306): Resetting slave info succeeded.
Master failover to 192.168.1.6(192.168.1.6:3306) completed successfully.

查看MHA manager配置文件,可以发现server1已经没了

[root@slave02 ~]# vim /etc/mha/mysql.cnf 
[server default]
manager_log=/var/log/mha/mysql/manager
manager_workdir=/var/log/mha/mysql
master_binlog_dir=/data/binlog
password=mha
ping_interval=2
repl_password=123
repl_user=repl
ssh_user=root
user=mha

[server2]
hostname=192.168.1.6
port=3306

[server3]
hostname=192.168.1.7
port=3306

可见MHA已经自动处理了故障(但是只是一次性处理).

故障修复

  1. 修复db01

    [root@master ~]# systemctl start mysqld
    
  2. 修复主从
    查看manager日志中change master to命令

    [root@slave02 ~]# vim /var/log/mha/mysql/manager
    CHANGE MASTER TO MASTER_HOST='192.168.1.6', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='123';
    

    在这里插入图片描述
    在db01中执行该命令,并启动slave

    db01 [(none)]>CHANGE MASTER TO MASTER_HOST='192.168.1.6', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='123';
    Query OK, 0 rows affected, 2 warnings (0.16 sec)
    
    db01 [(none)]>start slave;
    Query OK, 0 rows affected (0.14 sec)
    
  3. 修改db03上的manager配置文件,将db01加入

    [root@slave02 ~]# vim /etc/mha/mysql.cnf 
    [server1]
    hostname=192.168.1.5
    port=3306
    
  4. 启动MHA

    [root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
    [1] 7635
    [root@slave02 ~]#  masterha_check_status --conf=/etc/mha/mysql.cnf
    mysql (pid:7635) is running(0:PING_OK), master:192.168.1.6
    

MHA的 VIP 功能

  1. 将脚本文件放置在/usr/local/bin/目录下,并增加执行权限(master_ip_failover事先准备)

    [root@slave02 ~]# mv master_ip_failover /usr/local/bin/
    [root@slave02 ~]# chmod +x /usr/local/bin/master_ip_failover 
    

    master_ip_failover脚本如下:

    #!/usr/bin/env perl
    
    #  Copyright (C) 2011 DeNA Co.,Ltd.
    #
    #  This program is free software; you can redistribute it and/or modify
    #  it under the terms of the GNU General Public License as published by
    #  the Free Software Foundation; either version 2 of the License, or
    #  (at your option) any later version.
    #
    #  This program is distributed in the hope that it will be useful,
    #  but WITHOUT ANY WARRANTY; without even the implied warranty of
    #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    #  GNU General Public License for more details.
    #
    #  You should have received a copy of the GNU General Public License
    #   along with this program; if not, write to the Free Software
    #  Foundation, Inc.,
    #  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
    
    ## Note: This is a sample script and is not complete. Modify the script based on your environment.
    
    use strict;
    use warnings FATAL => 'all';
    
    use Getopt::Long;
    use MHA::DBHelper;
    
    my (
      $command,        $ssh_user,         $orig_master_host,
      $orig_master_ip, $orig_master_port, $new_master_host,
      $new_master_ip,  $new_master_port,  $new_master_user,
      $new_master_password
    );
    my $vip = '192.168.1.100/24';
    my $key = "1";
    my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
    my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
    
    GetOptions(
      'command=s'             => \$command,
      'ssh_user=s'            => \$ssh_user,
      'orig_master_host=s'    => \$orig_master_host,
      'orig_master_ip=s'      => \$orig_master_ip,
      'orig_master_port=i'    => \$orig_master_port,
      'new_master_host=s'     => \$new_master_host,
      'new_master_ip=s'       => \$new_master_ip,
      'new_master_port=i'     => \$new_master_port,
      'new_master_user=s'     => \$new_master_user,
      'new_master_password=s' => \$new_master_password,
    );
    
    exit &main();
    
    sub main {
      if ( $command eq "stop" || $command eq "stopssh" ) {
    
        # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
        # If you manage master ip address at global catalog database,
        # invalidate orig_master_ip here.
        my $exit_code = 1;
        eval {
    
          # updating global catalog, etc
          $exit_code = 0;
        };
        if ($@) {
          warn "Got Error: $@\n";
          exit $exit_code;
        }
        exit $exit_code;
      }
        elsif ( $command eq "start" ) {
    
            # all arguments are passed.
            # If you manage master ip address at global catalog database,
            # activate new_master_ip here.
            # You can also grant write access (create user, set read_only=0, etc) here.
            my $exit_code = 10;
            eval {
                print "Enabling the VIP - $vip on the new master - $new_master_host \n";
                &start_vip();
                &stop_vip();
                $exit_code = 0;
            };
            if ($@) {
                warn $@;
                exit $exit_code;
            }
            exit $exit_code;
        }
        elsif ( $command eq "status" ) {
            print "Checking the Status of the script.. OK \n";
            `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
            exit 0;
        }
        else {
            &usage();
            exit 1;
        }
    }
    
    
    sub start_vip() {
        `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
    }
    # A simple system call that disable the VIP on the old_master 
    sub stop_vip() {
       `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
    }
    
    
    sub usage {
      print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
    }
    
  2. 修改master_ip_failover配置信息

    [root@slave02 ~]# vim /usr/local/bin/master_ip_failover 
    my $vip = '192.168.1.10/24';
    my $key = "1";
    my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
    my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
    
  3. 编辑manager配置文件

    [root@slave02 ~]# vim /etc/mha/mysql.cnf
    master_ip_failover_script=/usr/local/bin/master_ip_failover
    
  4. 在主库手工生成第一个vip地址

    [root@slave01 ~]# ifconfig ens33:1 192.168.1.10/24
    

    注意:第一次配置VIP时,需要在主库手工生成VIP,因为只有在故障切换时才会自动生成VIP。

  5. 重启MHA

    [root@slave02 ~]# masterha_stop --conf=/etc/mha/mysql.cnf
    Stopped mysql successfully.
    [1]+  Exit 1                  nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/mysql/manager.log 2>&1
    [root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
    [1] 8498
    [root@slave02 ~]#  masterha_check_status --conf=/etc/mha/mysql.cnf
    mysql (pid:8498) is running(0:PING_OK), master:192.168.1.6
    
  6. 测试VIP功能:将此时的主库db02关闭mysqld服务模拟故障,预想结果应是db01接管master,且vip跳转到db01

    # 模拟故障,关闭db02 mysqld服务
    [root@slave01 ~]# systemctl stop mysqld
    
    # 查看db01是否有vip信息
    [root@master ~]# ifconfig
    ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
            inet 192.168.1.10  netmask 255.255.255.0  broadcast 192.168.1.255
            ether 00:0c:29:df:f3:fd  txqueuelen 1000  (Ethernet)
    
    # 查看db03的从库状态
    db03 [(none)]>show slave status \G
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.1.5
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000006
              Read_Master_Log_Pos: 194
                   Relay_Log_File: slave02-relay-bin.000002
                    Relay_Log_Pos: 367
            Relay_Master_Log_File: mysql-bin.000006
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
    

MHA Send_Report

MHA是一次性的高可用解决方案,因此出现了宕机需要得到及时的通知。MHA提供了report_script参数进行邮件告警。

  1. 准备邮件脚本send_report放置在/usr/local/bin/目录下,并增加执行权限
    [root@slave02 ~]# mv send_report /usr/local/bin/
    [root@slave02 ~]# chmod +x /usr/local/bin/send_report 
    
  2. 编辑manager配置文件,调用邮件脚本
    [root@slave02 ~]# vim /etc/mha/mysql.cnf
    report_script=/usr/local/bin/send_report
    
  3. 重启MHA
    [root@slave02 ~]# masterha_stop --conf=/etc/mha/mysql.cnf
    [root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
    
  4. 关闭主库,模拟故障,查看警告邮件
    [root@master ~]# systemctl stop mysqld
    

MHA Binlog Server

额外配置一台机器,记录主库所有二进制日志文件,防止SSH无法连接的情况。
注意:额外机器必须与原库版本一致,本测试采用db03机器。

  1. 编辑mha配置文件,增加binlog信息
    [root@slave02 ~]# vim /etc/mha/mysql.cnf
    [binlog1]
    no_master=1
    hostname=192.168.1.7
    master_binlog_dir=/binlog
    
  2. 创建必要目录并授权
    [root@slave02 ~]# mkdir /binlog
    [root@slave02 ~]# chown -R mysql.mysql /binlog/
    
  3. 拉取主库binlog日志
    [root@slave02 ~]# cd /binlog/      # 必须进入创建好的目录
    [root@slave02 ~]# mysqlbinlog  -R --host=192.168.1.5 --user=mha --password=mha --raw  --stop-never mysql-bin.000006 &
    # 注意:拉取日志的起点应该按照该目前从库已经获得到的二进制日志点为起点
    
  4. 重启MHA
    [root@slave02 ~]# masterha_stop --conf=/etc/mha/mysql.cnf
    [root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
    

故障处理:主库宕机,binlog server自动停止,manager也自动停止
处理思路:

管理员在高可用架构维护的职责

  1. 搭建:MHA+VIP+SendReport+BinlogServer
  2. 监控及故障处理
  3. 高可用架构的优化
           优化核心:尽可能降低主从的延时,让MHA花在数据补偿上的时间尽量减少;
           尽量使用MySQL 5.7 版本,开启GTID模式,开启从库SQL并发复制。

标签:可用,mysql,192.168,MHA,master,2021,MySQL,root,Oct
来源: https://blog.csdn.net/hawava/article/details/120606583