主从环境(配置手工切换)故障模拟
作者:互联网
环境:
OS:Centos7
DB:DM8
主库:192.168.1.135
备库:192.168.1.134
主备库dmwatcher.ini配置文件如下:
[dmdba@host134 slnngk]$ more dmwatcher.ini [GRP1] DW_TYPE = GLOBAL ##全局守护类型 DW_MODE = MANUAL ##手工切换 DW_ERROR_TIME = 10 ##远程守护进程故障认定时间 INST_RECOVER_TIME = 60 ##主库守护进程启动恢复的间隔时间 INST_ERROR_TIME = 10 ##本地实例故障认定时间 INST_OGUID = 453332 ##守护系统唯一 OGUID 值 INST_INI = /dmdbms/data/slnngk/dm.ini #dm.ini配置文件路径 INST_AUTO_RESTART = 1 ##打开实例的自动启动功能 INST_STARTUP_CMD = /dmdbms/product/bin/dmserver #命令行方式启动 RLOG_SEND_THRESHOLD = 0 ##指定主库发送日志到备库的时间阀值,默认关闭 RLOG_APPLY_THRESHOLD = 0 ##指定备库重演日志的时间阀值,默认关闭
1.停掉备库
[root@host134 ~]#systemctl stop DmServiceslnngk.service
发现dmwatcher会把数据库拉起来
[root@host134 ~]# ps -ef|grep slnngk
dmdba 19750 1 0 Jul15 ? 00:19:34 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 23199 1 1 13:49 ? 00:00:00 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 23538 32322 0 13:50 pts/4 00:00:00 grep --color=auto slnngk
dmdba 31905 26367 0 10:16 pts/0 00:00:14 dmmonitor /dmdbms/data/slnngk/dmmonitor.ini
2.停掉备库的dmwatcher进程
[root@host134 ~]#systemctl stop DmWatcherServiceGRP1
这个时候备库的守护进程dmwatcher进程和数据库进程都停掉了
[root@host134 ~]# ps -ef|grep slnngk
root 25001 32322 0 14:01 pts/4 00:00:00 grep --color=auto slnngk
dmdba 31905 26367 0 10:16 pts/0 00:00:15 dmmonitor /dmdbms/data/slnngk/dmmonitor.ini
启动守护进程dmwatcher
[root@host134 ~]#systemctl start DmWatcherServiceGRP1
这个时候守护进程dmwatcher会把备库拉起来
[root@host134 ~]# ps -ef|grep slnngk
dmdba 25477 1 0 14:04 ? 00:00:00 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 25507 1 1 14:04 ? 00:00:00 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 25694 32322 0 14:05 pts/4 00:00:00 grep --color=auto slnngk
dmdba 31905 26367 0 10:16 pts/0 00:00:15 dmmonitor /dmdbms/data/slnngk/dmmonitor.ini
3.停掉主库
[root@host135 soft]# systemctl stop DmServiceslnngk.service
这个时候守护进程会把主库拉起来
[root@host135 soft]# ps -ef|grep slnngk
dmdba 694 1 0 Jul15 ? 00:20:14 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 710 1 1 14:23 ? 00:00:00 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 981 11261 0 14:24 pts/5 00:00:00 grep --color=auto slnngk
数据库状态是打开的
[dmdba@host135 ~]$ disql sysdba/dameng123 Server[LOCALHOST:5236]:mode is primary, state is open login used time : 2.627(ms) disql V8 SQL> select status$ from SYS."V$DATABASE"; LINEID STATUS$ ---------- ----------- 1 4 used time: 3.409(ms). Execute id is 800.
尝试kill掉进程
[root@host135 soft]# ps -ef|grep slnngk
dmdba 694 1 0 Jul15 ? 00:20:14 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 710 1 0 14:23 ? 00:00:01 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 2319 11261 0 14:31 pts/5 00:00:00 grep --color=auto slnngk
[root@host135 soft]#kill -9 710
这个时候因为主从我是配置为手工切换的,所以不会发生切换,守护进程会自动把主库拉起来,角色还是主库的角色
4.停掉主库的dmwatcher进程
[root@host135 soft]# systemctl stop DmWatcherServiceGRP1
这个时候数据库进程和数据库守护进程没有了
[root@host135 soft]# ps -ef|grep slnngk
root 3872 11261 0 14:37 pts/5 00:00:00 grep --color=auto slnngk
这个时候监控机无法监控到主库的信息了
show 2022-07-26 14:47:02 #================================================================================# GROUP OGUID MON_CONFIRM MODE MPP_FLAG GRP1 453332 TRUE MANUAL FALSE <<DATABASE GLOBAL INFO:>> DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT 192.168.1.134 52141 2022-07-26 14:47:01 GLOBAL VALID OPEN SLNNGKBAK OK 1 1 OPEN STANDBY DSC_OPEN REALTIME INVALID EP INFO: INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG 192.168.1.134 5236 OK SLNNGKBAK OPEN STANDBY 0 0 REALTIME UNKNOWN 382176 437742 382176 437742 NONE DATABASE(SLNNGKBAK) APPLY INFO FROM (UNKNOWN), REDOS_PARALLEL_NUM (1): DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[383314, 383314, 383314], (RLSN, SLSN, KLSN)[437742, 437742, 437742], N_TSK[0], TSK_MEM_USE[0] REDO_LSN_ARR: (437742)
手工启动守护进程
[root@host135 soft]# systemctl start DmWatcherServiceGRP1
这个时候主库恢复了,角色还是主库,没有发生切换,因为我配置的是手工切换.
5.手工把备库切换成主库
choose takeover GRP1
Can choose one of the following instances to do takeover:
1: SLNNGKBAK
takeover GRP1.SLNNGKBAK
这个时候查看数据库状态
show 2022-07-26 15:25:12 #================================================================================# GROUP OGUID MON_CONFIRM MODE MPP_FLAG GRP1 453332 TRUE MANUAL FALSE <<DATABASE GLOBAL INFO:>> DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT 192.168.1.134 52141 2022-07-26 15:25:11 GLOBAL VALID OPEN SLNNGKBAK OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID EP INFO: INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG 192.168.1.134 5236 OK SLNNGKBAK OPEN PRIMARY 0 0 REALTIME VALID 383964 440754 383964 440755 NONE ERROR DATABASE: <<DATABASE GLOBAL INFO:>> DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT 192.168.1.135 52141 2022-07-26 15:19:11 GLOBAL VALID ERROR SLNNGK OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID EP INFO: INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG 192.168.1.135 5236 OK SLNNGK OPEN PRIMARY 0 0 REALTIME VALID 383951 439290 383951 439290 NONE #================================================================================#
这个时候原来的主库192.168.1.135状态是ERROR的.
我们尝试在目前的主库写入数据,然后启动原来的主库,看数据是否同步
192.168.1.134
su - dmdba
[dmdba@host134 ~]$ disql hxl/dameng123
Server[LOCALHOST:5236]:mode is primary, state is open
login used time : 3.029(ms)
disql V8
SQL> select * from tb_test01;
LINEID ID NAME
---------- -- -----
1 1 name1
2 2 name2
3 3 name3
4 4 name4
5 5 name5
used time: 4.038(ms). Execute id is 600.
SQL> insert into tb_test01 values(6,'name6');
affect rows 1
used time: 1.427(ms). Execute id is 601.
SQL> insert into tb_test01 values(7,'name7');
affect rows 1
SQL> commit;
executed successfully
used time: 9.266(ms). Execute id is 603.
SQL> select * from tb_test01;
LINEID ID NAME
---------- -- -----
1 1 name1
2 2 name2
3 3 name3
4 4 name4
5 5 name5
6 6 name6
7 7 name7
7 rows got
这个时候启动原来的主库守护进程
[root@host135 soft]# systemctl start DmWatcherServiceGRP1
show 2022-07-26 15:32:23 #================================================================================# GROUP OGUID MON_CONFIRM MODE MPP_FLAG GRP1 453332 TRUE MANUAL FALSE <<DATABASE GLOBAL INFO:>> DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT 192.168.1.134 52141 2022-07-26 15:32:23 GLOBAL VALID OPEN SLNNGKBAK OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID EP INFO: INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG 192.168.1.134 5236 OK SLNNGKBAK OPEN PRIMARY 0 0 REALTIME VALID 384114 440911 384114 440912 NONE <<DATABASE GLOBAL INFO:>> DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT 192.168.1.135 52141 2022-07-26 15:32:23 GLOBAL VALID OPEN SLNNGK OK 1 1 OPEN STANDBY DSC_OPEN REALTIME VALID EP INFO: INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG 192.168.1.135 5236 OK SLNNGK OPEN STANDBY 0 0 REALTIME VALID 383954 440910 383954 440910 NONE DATABASE(SLNNGK) APPLY INFO FROM (SLNNGKBAK), REDOS_PARALLEL_NUM (1): DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[384113, 384113, 384114], (RLSN, SLSN, KLSN)[440910, 440910, 440911], N_TSK[0], TSK_MEM_USE[512] REDO_LSN_ARR: (440910)
这个时候原主库启动了,加入到集群中的角色变成了备库,查看下数据同步情况
192.168.1.135
su - dmdba
[dmdba@host135 ~]$ disql hxl/dameng123
SQL> select * from tb_test01;
LINEID ID NAME
---------- -- -----
1 1 name1
2 2 name2
3 3 name3
4 4 name4
5 5 name5
6 6 name6
7 7 name7
7 rows got
used time: 6.329(ms). Execute id is 0
可以看到数据同步过来的.
标签:00,OK,手工,dmdbms,DSC,故障模拟,INST,OPEN,主从 来源: https://www.cnblogs.com/hxlasky/p/16521320.html