其他分享
首页 > 其他分享> > 主从环境(配置手工切换)故障模拟

主从环境(配置手工切换)故障模拟

作者:互联网

环境:
OS:Centos7
DB:DM8
主库:192.168.1.135
备库:192.168.1.134
主备库dmwatcher.ini配置文件如下:

[dmdba@host134 slnngk]$ more dmwatcher.ini
[GRP1]
DW_TYPE       =  GLOBAL     ##全局守护类型
DW_MODE       =  MANUAL     ##手工切换
DW_ERROR_TIME    =  10      ##远程守护进程故障认定时间
INST_RECOVER_TIME =  60     ##主库守护进程启动恢复的间隔时间
INST_ERROR_TIME  =  10      ##本地实例故障认定时间
INST_OGUID     =  453332    ##守护系统唯一 OGUID 值
INST_INI      =  /dmdbms/data/slnngk/dm.ini  #dm.ini配置文件路径
INST_AUTO_RESTART =  1      ##打开实例的自动启动功能
INST_STARTUP_CMD  =  /dmdbms/product/bin/dmserver #命令行方式启动
RLOG_SEND_THRESHOLD =  0    ##指定主库发送日志到备库的时间阀值,默认关闭
RLOG_APPLY_THRESHOLD =  0   ##指定备库重演日志的时间阀值,默认关闭

 

1.停掉备库
[root@host134 ~]#systemctl stop DmServiceslnngk.service

发现dmwatcher会把数据库拉起来
[root@host134 ~]# ps -ef|grep slnngk
dmdba 19750 1 0 Jul15 ? 00:19:34 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 23199 1 1 13:49 ? 00:00:00 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 23538 32322 0 13:50 pts/4 00:00:00 grep --color=auto slnngk
dmdba 31905 26367 0 10:16 pts/0 00:00:14 dmmonitor /dmdbms/data/slnngk/dmmonitor.ini

 

2.停掉备库的dmwatcher进程
[root@host134 ~]#systemctl stop DmWatcherServiceGRP1

这个时候备库的守护进程dmwatcher进程和数据库进程都停掉了
[root@host134 ~]# ps -ef|grep slnngk
root 25001 32322 0 14:01 pts/4 00:00:00 grep --color=auto slnngk
dmdba 31905 26367 0 10:16 pts/0 00:00:15 dmmonitor /dmdbms/data/slnngk/dmmonitor.ini

 

启动守护进程dmwatcher
[root@host134 ~]#systemctl start DmWatcherServiceGRP1

这个时候守护进程dmwatcher会把备库拉起来
[root@host134 ~]# ps -ef|grep slnngk
dmdba 25477 1 0 14:04 ? 00:00:00 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 25507 1 1 14:04 ? 00:00:00 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 25694 32322 0 14:05 pts/4 00:00:00 grep --color=auto slnngk
dmdba 31905 26367 0 10:16 pts/0 00:00:15 dmmonitor /dmdbms/data/slnngk/dmmonitor.ini

 

 

3.停掉主库
[root@host135 soft]# systemctl stop DmServiceslnngk.service

这个时候守护进程会把主库拉起来
[root@host135 soft]# ps -ef|grep slnngk
dmdba 694 1 0 Jul15 ? 00:20:14 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 710 1 1 14:23 ? 00:00:00 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 981 11261 0 14:24 pts/5 00:00:00 grep --color=auto slnngk

 

数据库状态是打开的

[dmdba@host135 ~]$ disql sysdba/dameng123

Server[LOCALHOST:5236]:mode is primary, state is open
login used time : 2.627(ms)
disql V8
SQL> select status$ from SYS."V$DATABASE";

LINEID     STATUS$    
---------- -----------
1          4

used time: 3.409(ms). Execute id is 800.

 

 

尝试kill掉进程
[root@host135 soft]# ps -ef|grep slnngk
dmdba 694 1 0 Jul15 ? 00:20:14 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 710 1 0 14:23 ? 00:00:01 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 2319 11261 0 14:31 pts/5 00:00:00 grep --color=auto slnngk

[root@host135 soft]#kill -9 710

这个时候因为主从我是配置为手工切换的,所以不会发生切换,守护进程会自动把主库拉起来,角色还是主库的角色

 

4.停掉主库的dmwatcher进程
[root@host135 soft]# systemctl stop DmWatcherServiceGRP1
这个时候数据库进程和数据库守护进程没有了
[root@host135 soft]# ps -ef|grep slnngk
root 3872 11261 0 14:37 pts/5 00:00:00 grep --color=auto slnngk

这个时候监控机无法监控到主库的信息了

show
2022-07-26 14:47:02 
#================================================================================#
GROUP            OGUID       MON_CONFIRM     MODE            MPP_FLAG  
GRP1             453332      TRUE            MANUAL          FALSE     


<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
192.168.1.134       52141        2022-07-26 14:47:01  GLOBAL    VALID     OPEN           SLNNGKBAK        OK        1     1     OPEN        STANDBY   DSC_OPEN       REALTIME  INVALID  

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
192.168.1.134       5236       OK        SLNNGKBAK        OPEN        STANDBY   0          0            REALTIME  UNKNOWN  382176          437742          382176          437742          NONE                  

DATABASE(SLNNGKBAK) APPLY INFO FROM (UNKNOWN), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[383314, 383314, 383314], (RLSN, SLSN, KLSN)[437742, 437742, 437742], N_TSK[0], TSK_MEM_USE[0] 
REDO_LSN_ARR: (437742)

 

手工启动守护进程
[root@host135 soft]# systemctl start DmWatcherServiceGRP1

这个时候主库恢复了,角色还是主库,没有发生切换,因为我配置的是手工切换.

 

5.手工把备库切换成主库

choose takeover GRP1
Can choose one of the following instances to do takeover:
1: SLNNGKBAK

takeover GRP1.SLNNGKBAK

这个时候查看数据库状态

show
2022-07-26 15:25:12 
#================================================================================#
GROUP            OGUID       MON_CONFIRM     MODE            MPP_FLAG  
GRP1             453332      TRUE            MANUAL          FALSE     


<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
192.168.1.134       52141        2022-07-26 15:25:11  GLOBAL    VALID     OPEN           SLNNGKBAK        OK        1     1     OPEN        PRIMARY   DSC_OPEN       REALTIME  VALID    

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
192.168.1.134       5236       OK        SLNNGKBAK        OPEN        PRIMARY   0          0            REALTIME  VALID    383964          440754          383964          440755          NONE                  

ERROR DATABASE:

<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
192.168.1.135       52141        2022-07-26 15:19:11  GLOBAL    VALID     ERROR          SLNNGK           OK        1     1     OPEN        PRIMARY   DSC_OPEN       REALTIME  VALID    

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
192.168.1.135       5236       OK        SLNNGK           OPEN        PRIMARY   0          0            REALTIME  VALID    383951          439290          383951          439290          NONE                  

#================================================================================#

这个时候原来的主库192.168.1.135状态是ERROR的.

 

我们尝试在目前的主库写入数据,然后启动原来的主库,看数据是否同步

192.168.1.134
su - dmdba
[dmdba@host134 ~]$ disql hxl/dameng123

Server[LOCALHOST:5236]:mode is primary, state is open
login used time : 3.029(ms)
disql V8
SQL> select * from tb_test01;

LINEID ID NAME
---------- -- -----
1 1 name1
2 2 name2
3 3 name3
4 4 name4
5 5 name5

used time: 4.038(ms). Execute id is 600.
SQL> insert into tb_test01 values(6,'name6');
affect rows 1

used time: 1.427(ms). Execute id is 601.
SQL> insert into tb_test01 values(7,'name7');
affect rows 1

SQL> commit;
executed successfully
used time: 9.266(ms). Execute id is 603.
SQL> select * from tb_test01;

LINEID ID NAME
---------- -- -----
1 1 name1
2 2 name2
3 3 name3
4 4 name4
5 5 name5
6 6 name6
7 7 name7

7 rows got

 

这个时候启动原来的主库守护进程
[root@host135 soft]# systemctl start DmWatcherServiceGRP1

show  
2022-07-26 15:32:23 
#================================================================================#
GROUP            OGUID       MON_CONFIRM     MODE            MPP_FLAG  
GRP1             453332      TRUE            MANUAL          FALSE     


<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
192.168.1.134       52141        2022-07-26 15:32:23  GLOBAL    VALID     OPEN           SLNNGKBAK        OK        1     1     OPEN        PRIMARY   DSC_OPEN       REALTIME  VALID    

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
192.168.1.134       5236       OK        SLNNGKBAK        OPEN        PRIMARY   0          0            REALTIME  VALID    384114          440911          384114          440912          NONE                  

<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
192.168.1.135       52141        2022-07-26 15:32:23  GLOBAL    VALID     OPEN           SLNNGK           OK        1     1     OPEN        STANDBY   DSC_OPEN       REALTIME  VALID    

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
192.168.1.135       5236       OK        SLNNGK           OPEN        STANDBY   0          0            REALTIME  VALID    383954          440910          383954          440910          NONE                  

DATABASE(SLNNGK) APPLY INFO FROM (SLNNGKBAK), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[384113, 384113, 384114], (RLSN, SLSN, KLSN)[440910, 440910, 440911], N_TSK[0], TSK_MEM_USE[512] 
REDO_LSN_ARR: (440910)

这个时候原主库启动了,加入到集群中的角色变成了备库,查看下数据同步情况

 

192.168.1.135
su - dmdba
[dmdba@host135 ~]$ disql hxl/dameng123
SQL> select * from tb_test01;

LINEID ID NAME
---------- -- -----
1 1 name1
2 2 name2
3 3 name3
4 4 name4
5 5 name5
6 6 name6
7 7 name7

7 rows got

used time: 6.329(ms). Execute id is 0

可以看到数据同步过来的.

 

标签:00,OK,手工,dmdbms,DSC,故障模拟,INST,OPEN,主从
来源: https://www.cnblogs.com/hxlasky/p/16521320.html