首页 > 其他分享> > KingbaseES R6集群repmgr witness 手工配置案例

KingbaseES R6集群repmgr witness 手工配置案例

2022-01-19 11:34:58 作者：互联网

使用见证服务器:

见证服务器是一个正常的KingbaseES实例，不是流复制群集的一部分; 其目的是，如果发生故障转移情况，则提供证明它是主服务器本身不可用的证据，而不是例如在不同物理位置之间的网络分裂。见证服务器的典型用例是双节点流复制设置，其中主要和备用服务器位于不同的位置（数据中心）。通过在与主服务器相同的位置（数据中心）中创建见证服务器，如果主服务器变得不可用，则备用服务器可以决定是否可以在不“脑裂”情况的情况下提升为主：如果它无法看到见证人或主服务器，它可能存在网络级中断，它不应该提升为主。如果它可以看到见证人但不能看到主节点，这证明没有网络中断且主本身不可用，因此它可以提升自己为主。
对于更复杂的复制方案，例如使用多个数据中心，最好使用基于位置的故障转移，这可确保只有与主服务器位于同一位置的节点才能成为主节点。
要创建见证服务器，请在与群集的主服务器位于同一物理位置的服务器上设置普通的PostgreSQL实例。不应该在与主服务器同一个物理主机创建见证服务器，否则如果主服务器由于硬件问题失败，见证服务器会失效。

数据库版本：

test=# select version();
                                                       version                                                        
----------------------------------------------------------------------------------------------------------------------
 KingbaseES V008R006C003B0010 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

repmgr cluster原架构：

[kingbase@node2 bin]$ ./repmgr cluster show
 ID | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+----------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node248  | primary | * running |          | default  | 100      | 18       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 3  | node243  | standby |   running | node248  | default  | 100      | 18       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 5  | node243B | standby |   running | node248  | default  | 100      | 18       | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

一、创建witness服务器

=注意：witness服务器，应该是一个独立的主机节点不能和主库或备库在同一个主机上，并且witness和其他主机之间不构成流复制，所以witness是一个独立的primary实例，其数据库systemID，不应该和其他数据库一致，需要单独initdb一个实例，不能是通过clone或copy生成数据库。=

1）初始化实例（node2节点）

=将cluster其他节点的软件安装文件，拷贝到witness节点，然后重新初始一个实例=

 [kingbase@node2 bin]$ ./initdb -D /home/kingbase/cluster/R6HA/KHA/kingbase/data -E utf8 -U system -W
......

配置repmgr extension：

启动数据库服务：
[kingbase@node2 bin]$ ./sys_ctl -D /home/kingbase/cluster/R6HA/KHA/kingbase/data start
......
server started

2）创建repmgr元数据库和schema

[kingbase@node2 bin]$ ./ksql -U system test
ksql (V8.0)
Type "help" for help.

# 创建esrep用户
test=# create user esrp with superuser;
CREATE ROLE
test=# alter user esrep with password 'Kingbaseha110';
ALTER ROLE

#创建esrep数据库
test=# create database esrep owner esrep;
CREATE DATABASE
test=# \c esrep esrep
You are now connected to database "esrep" as user "esrep".
esrep=# \d
              List of relations
 Schema |        Name         | Type | Owner  
--------+---------------------+------+--------
 public | sys_stat_statements | view | system
(1 row)

# 创建repmgr schema
esrep=# create schema repmgr;
CREATE SCHEMA

二、将witness加入repmgr cluster

1）配置repmgr.conf文件

[kingbase@node2 etc]$ cat repmgr.conf
on_bmj=off
node_id=2
node_name=node249
promote_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr  standby promote -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr  standby follow  -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
conninfo='host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2'
 
log_file='/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log'
data_directory='/home/kingbase/cluster/R6HA/KHA/kingbase/data'
sys_bindir='/home/kingbase/cluster/R6HA/KHA/kingbase/bin'
ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'
reconnect_attempts=2
reconnect_interval=3
failover='automatic'
recovery='automatic'
monitoring_history='no'
trusted_servers='192.168.7.1'
virtual_ip='192.168.7.240/24'
net_device='enp0s3'
ipaddr_path='/sbin'
arping_path='/sbin'
synchronous='quorum'
repmgrd_pid_file='/home/kingbase/cluster/R6HA/KHA/kingbase/hamgrd.pid'
ping_path='/usr/bin'
#priority=0

2）注册witness到repmgr cluster

[kingbase@node2 bin]$ ./repmgr witness register -h 192.168.7.248 
# -h 指向主库节点ip
INFO: connecting to witness node "node249" (ID: 2)
INFO: connecting to primary node
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
INFO: witness registration complete
NOTICE: witness node "node249" (ID: 2) successfully registered

[kingbase@node2 bin]$ ./repmgr cluster show
 ID | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+----------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node248  | primary | * running |          | default  | 100      | 18       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 2  | node249  | witness | * running | node248  | default  | 0        | 1        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 3  | node243  | standby |   running | node248  | default  | 100      | 18       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 5  | node243B | standby |   running | node248  | default  | 100      | 18       | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

3）查看witness元数据库数据信息

=witness注册到repmgr cluster后，自动在esrep数据库的repmgr schema下创建repmgr元数据对象=

 [kingbase@node2 bin]$ ./ksql -U esrep esrep
ksql (V8.0)
Type "help" for help.

esrep=# \d repmgr.*
                                 Table "repmgr.events"
     Column      |           Type           | Collation | Nullable |      Default      
-----------------+--------------------------+-----------+----------+-------------------
 node_id         | integer                  |           | not null | 
 event           | text                     |           | not null | 
 successful      | boolean                  |           | not null | true
 event_timestamp | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
 details         | text                     |           |          | 

               Index "repmgr.idx_monitoring_history_time"
      Column       |           Type           | Key? |    Definition     
-------------------+--------------------------+------+-------------------
 last_monitor_time | timestamp with time zone | yes  | last_monitor_time
 standby_node_id   | integer                  | yes  | standby_node_id
btree, for table "repmgr.monitoring_history"

                           Table "repmgr.monitoring_history"
          Column           |           Type           | Collation | Nullable | Default 
---------------------------+--------------------------+-----------+----------+---------
 primary_node_id           | integer                  |           | not null | 
 standby_node_id           | integer                  |           | not null | 
 last_monitor_time         | timestamp with time zone |           | not null | 
 last_apply_time           | timestamp with time zone |           |          | 
 last_wal_primary_location | pg_lsn                   |           | not null | 
 last_wal_standby_location | pg_lsn                   |           |          | 
 replication_lag           | bigint                   |           | not null | 
 apply_lag                 | bigint                   |           | not null | 
Indexes:
    "idx_monitoring_history_time" btree (last_monitor_time, standby_node_id)

                                  Table "repmgr.nodes"
      Column      |            Type            | Collation | Nullable |     Default     
------------------+----------------------------+-----------+----------+-----------------
 node_id          | integer                    |           | not null | 
 upstream_node_id | integer                    |           |          | 
 active           | boolean                    |           | not null | true
 node_name        | text                       |           | not null | 
 type             | text                       |           | not null | 
 location         | text                       |           | not null | 'default'::text
 priority         | integer                    |           | not null | 100
 conninfo         | text                       |           | not null | 
 repluser         | character varying(63 char) |           | not null | 
 slot_name        | text                       |           |          | 
 config_file      | text                       |           | not null | 
Indexes:
    "nodes_pkey" PRIMARY KEY, btree (node_id)
Check constraints:
    "nodes_type_check" CHECK (type = ANY (ARRAY['primary'::text, 'standby'::text, 'witness'::text, 'bdr'::text]))
Foreign-key constraints:
    "nodes_upstream_node_id_fkey" FOREIGN KEY (upstream_node_id) REFERENCES repmgr.nodes(node_id) DEFERRABLE
Referenced by:
    TABLE "repmgr.nodes" CONSTRAINT "nodes_upstream_node_id_fkey" FOREIGN KEY (upstream_node_id) REFERENCES repmgr.nodes(node_id) DEFERRABLE

       Index "repmgr.nodes_pkey"
 Column  |  Type   | Key? | Definition 
---------+---------+------+------------
 node_id | integer | yes  | node_id
primary key, btree, for table "repmgr.nodes"

                           View "repmgr.replication_status"
          Column           |           Type           | Collation | Nullable | Default 
---------------------------+--------------------------+-----------+----------+---------
 primary_node_id           | integer                  |           |          | 
 standby_node_id           | integer                  |           |          | 
 standby_name              | text                     |           |          | 
 node_type                 | text                     |           |          | 
 active                    | boolean                  |           |          | 
 last_monitor_time         | timestamp with time zone |           |          | 
 last_wal_primary_location | pg_lsn                   |           |          | 
 last_wal_standby_location | pg_lsn                   |           |          | 
 replication_lag           | text                     |           |          | 
 replication_time_lag      | interval                 |           |          | 
 apply_lag                 | text                     |           |          | 
 communication_time_lag    | interval                 |           |          | 

                   View "repmgr.show_nodes"
       Column       |  Type   | Collation | Nullable | Default 
--------------------+---------+-----------+----------+---------
 node_id            | integer |           |          | 
 node_name          | text    |           |          | 
 active             | boolean |           |          | 
 upstream_node_id   | integer |           |          | 
 upstream_node_name | text    |           |          | 
 type               | text    |           |          | 
 priority           | integer |           |          | 
 conninfo           | text    |           |          | 

            Table "repmgr.voting_term"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 term   | integer |           | not null | 
Indexes:
    "voting_term_restrict" UNIQUE, btree ((true))
Rules:
    voting_term_delete AS
    ON DELETE TO repmgr.voting_term DO INSTEAD NOTHING

 Index "repmgr.voting_term_restrict"
 Column |  Type   | Key? | Definition 
--------+---------+------+------------
 bool   | boolean | yes  | (true)
unique, btree, for table "repmgr.voting_term"

三、witness节点注册故障分析

=如下所示，witness在其他节点的状态为“? unreachable ”。=

[kingbase@node1 bin]$ ./repmgr cluster show
 ID | Name     | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+----------+---------+---------------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node248  | primary | * running     |          | default  | 100      | 18       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 2  | node249  | witness | ? unreachable | node248  | default  | 0        | ?        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 3  | node243  | standby |   running     | node248  | default  | 100      | 18       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 5  | node243B | standby |   running     | node248  | default  | 100      | 18       | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

WARNING: following issues were detected
  - unable to connect to node "node249" (ID: 2)

1）测试ksql到witness节点的连接（连接失败）

[kingbase@node1 bin]$ ./ksql -h 192.168.7.249 -U esrep esrep
ksql: error: could not connect to server: could not connect to server: No route to host
       Is the server running on host "192.168.7.249" and accepting
       TCP/IP connections on port 54321?

# 节点ping
[kingbase@node1 bin]$ ping 192.168.7.249
PING 192.168.7.249 (192.168.7.249) 56(84) bytes of data.
64 bytes from 192.168.7.249: icmp_seq=1 ttl=64 time=0.513 ms
64 bytes from 192.168.7.249: icmp_seq=2 ttl=64 time=0.390 ms
64 bytes from 192.168.7.249: icmp_seq=3 ttl=64 time=0.478 ms
^C
--- 192.168.7.249 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.390/0.460/0.513/0.054 ms

2）查看witness服务器防火墙配置

[root@node2 shell]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     udp  --  anywhere             anywhere             udp dpt:domain
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:domain
ACCEPT     udp  --  anywhere             anywhere             udp dpt:bootps
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:bootps
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere            
INPUT_direct  all  --  anywhere             anywhere            
INPUT_ZONES_SOURCE  all  --  anywhere             anywhere            
INPUT_ZONES  all  --  anywhere             anywhere            
ACCEPT     icmp --  anywhere             anywhere            
REJECT     all  --  anywhere             anywhere             reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             bogon/24             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  192.168.122.0/24     anywhere            
ACCEPT     all  --  anywhere             anywhere            
REJECT     all  --  anywhere             anywhere             reject-with icmp-port-unreachable
REJECT     all  --  anywhere             anywhere             reject-with icmp-port-unreachable
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere            
FORWARD_direct  all  --  anywhere             anywhere            
FORWARD_IN_ZONES_SOURCE  all  --  anywhere             anywhere            
FORWARD_IN_ZONES  all  --  anywhere             anywhere            
FORWARD_OUT_ZONES_SOURCE  all  --  anywhere             anywhere            
FORWARD_OUT_ZONES  all  --  anywhere             anywhere            
ACCEPT     icmp --  anywhere             anywhere            
REJECT     all  --  anywhere             anywhere             reject-with icmp-host-prohibited
......

=== 有以上可知，witness服务器节点防火墙被启动===

3）清理witness主机防火墙规则
[root@node2 shell]# iptables -F

4）测试witness主机数据库连接

[kingbase@node1 bin]$ ./ksql -h 192.168.7.249 -U system test
ksql (V8.0)
Type "help" for help.

5）查看集群节点状态

[kingbase@node1 bin]$ ./repmgr cluster show
 ID | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+----------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node248  | primary | * running |          | default  | 100      | 18       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 2  | node249  | witness | * running | node248  | default  | 0        | 1        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 3  | node243  | standby |   running | node248  | default  | 100      | 18       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 5  | node243B | standby |   running | node248  | default  | 100      | 18       | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

四、集群failover 切换后

1）查看集群节点状态

[kingbase@node2 bin]$ ./repmgr cluster show
 ID | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+----------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node248  | primary | * running |          | default  | 100      | 18       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 2  | node249  | witness | * running | node248  | default  | 0        | 1        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 3  | node243  | standby |   running | node248  | default  | 100      | 18       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 5  | node243B | standby |   running | node248  | default  | 100      | 18       | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

2）集群主备切换后，witness重新注册连接新的主库

[kingbase@node2 bin]$ ./repmgr witness register --force -h 192.168.7.243
INFO: connecting to witness node "node249" (ID: 2)
INFO: connecting to primary node
INFO: "repmgr" extension is already installed
INFO: witness registration complete
NOTICE: witness node "node249" (ID: 2) successfully registered

[kingbase@node2 bin]$ ./repmgr cluster show
 ID | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+----------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node248  | standby |   running | node243  | default  | 100      | 18       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 2  | node249  | witness | * running | node243  | default  | 0        | 1        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 3  | node243  | primary | * running |          | default  | 100      | 19       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 5  | node243B | standby |   running | node243  | default  | 100      | 18       | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2[kingbase@node2 bin]$ ./repmgr witness register --force -h 192.168.7.243
INFO: connecting to witness node "node249" (ID: 2)
INFO: connecting to primary node
INFO: "repmgr" extension is already installed
INFO: witness registration complete
NOTICE: witness node "node249" (ID: 2) successfully registered

[kingbase@node2 bin]$ ./repmgr cluster show
 ID | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+----------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node248  | standby |   running | node243  | default  | 100      | 18       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 2  | node249  | witness | * running | node243  | default  | 0        | 1        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 3  | node243  | primary | * running |          | default  | 100      | 19       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 5  | node243B | standby |   running | node243  | default  | 100      | 18       | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

新主库hamgr.log日志：

 [2021-03-01 12:49:05] [WARNING] unable to ping "host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2"
[2021-03-01 12:49:05] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-03-01 12:49:05] [WARNING] unable to connect to upstream node "node248" (ID: 1)
[2021-03-01 12:49:05] [INFO] sleeping 3 seconds until next reconnection attempt
[2021-03-01 12:49:08] [INFO] checking state of node 1, 1 of 2 attempts
[2021-03-01 12:49:08] [WARNING] unable to ping "user=esrep connect_timeout=10 dbname=esrep host=192.168.7.248 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2 fallback_application_name=repmgr"
[2021-03-01 12:49:08] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-03-01 12:49:08] [INFO] sleeping 3 seconds until next reconnection attempt
[2021-03-01 12:49:11] [INFO] checking state of node 1, 2 of 2 attempts
[2021-03-01 12:49:11] [WARNING] unable to ping "user=esrep connect_timeout=10 dbname=esrep host=192.168.7.248 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2 fallback_application_name=repmgr"
[2021-03-01 12:49:11] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-03-01 12:49:11] [WARNING] unable to reconnect to node 1 after 2 attempts
[2021-03-01 12:49:11] [NOTICE] setting "wal_retrieve_retry_interval" to 86405000 milliseconds
[2021-03-01 12:49:12] [WARNING] wal receiver not running
[2021-03-01 12:49:12] [NOTICE] WAL receiver disconnected on all sibling nodes
[2021-03-01 12:49:12] [INFO] WAL receiver disconnected on all 2 sibling nodes
[2021-03-01 12:49:12] [INFO] 2 active sibling nodes registered
[2021-03-01 12:49:12] [INFO] primary and this node have the same location ("default")
[2021-03-01 12:49:12] [INFO] local node's last receive lsn: 5/640000A0
[2021-03-01 12:49:12] [INFO] checking state of sibling node "node249" (ID: 2)
[2021-03-01 12:49:12] [INFO] node "node249" (ID: 2) reports its upstream is node 1, last seen 7 second(s) ago
[2021-03-01 12:49:12] [INFO] node 2 last saw primary node 7 second(s) ago
[2021-03-01 12:49:12] [INFO] checking state of sibling node "node243B" (ID: 5)
[2021-03-01 12:49:12] [WARNING] repmgrd not running on node "node243B" (ID: 5), skipping
[2021-03-01 12:49:12] [INFO] visible nodes: 3; total nodes: 3; no nodes have seen the primary within the last 4 seconds
[2021-03-01 12:49:12] [NOTICE] promotion candidate is "node243" (ID: 3)
[2021-03-01 12:49:12] [NOTICE] setting "wal_retrieve_retry_interval" to 5000 ms
[2021-03-01 12:49:12] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2021-03-01 12:49:12] [INFO] try to ping the trusted_servers "192.168.7.1" before execute promote_command
[2021-03-01 12:49:14] [NOTICE] PING 192.168.7.1 (192.168.7.1) 56(84) bytes of data.

--- 192.168.7.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 2.450/2.460/2.471/0.050 ms
A
[2021-03-01 12:49:14] [NOTICE] successfully ping one or more of the trusted_servers "192.168.7.1"
[2021-03-01 12:49:14] [NOTICE] try to stop old primary db (host: "192.168.7.248")
ERROR: connection to database failed
DETAIL: 
could not connect to server: Connection refused
        Is the server running on host "192.168.7.248" and accepting
        TCP/IP connections on port 54321?

DETAIL: attempted to connect using:
  user=esrep connect_timeout=10 dbname=esrep host=192.168.7.248 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2 fallback_application_name=repmgr
[2021-03-01 12:49:16] [NOTICE] PING 192.168.7.240 (192.168.7.240) 56(84) bytes of data.

--- 192.168.7.240 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.357/0.365/0.374/0.020 ms

[2021-03-01 12:49:16] [WARNING] the virtual ip is already on other host, try to release it on old primary node (host: "192.168.7.248")
[2021-03-01 12:49:16] [INFO] SSH connection to host "192.168.7.248" succeeded, ready to release vip on it
[2021-03-01 12:49:17] [NOTICE] old primary node (host: "192.168.7.248") release the virtual ip 192.168.7.240/24 success
[2021-03-01 12:49:17] [NOTICE] will acquire the virtual ip again
[2021-03-01 12:49:18] [NOTICE] PING 192.168.7.240 (192.168.7.240) 56(84) bytes of data.

--- 192.168.7.240 ping statistics ---
2 packets transmitted, 0 received, +1 errors, 100% packet loss, time 999ms


[2021-03-01 12:49:18] [WARNING] ping host"192.168.7.240" failed
[2021-03-01 12:49:18] [DETAIL] average RTT value is not greater than zero
[2021-03-01 12:49:19] [NOTICE] new primary node (ID: 3) acquire the virtual ip 192.168.7.240/24 success
[2021-03-01 12:49:19] [INFO] promote_command is:
  "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr  standby promote -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf"
WARNING: 2 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
  node249 (node ID: 2, witness server)
  node243B (node ID: 5)
NOTICE: promoting standby to primary
DETAIL: promoting server "node243" (ID: 3) using sys_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
INFO: SET synchronous TO "async" on primary host 
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node243" (ID: 3) was successfully promoted to primary

标签：node,10,R6,keepalives,192.168,esrep,12,repmgr,KingbaseES
来源： https://www.cnblogs.com/tiany1224/p/15821443.html