数据库
首页 > 数据库> > 基于Promethues与Grafana的Greenplum分布式数据库监控的实现

基于Promethues与Grafana的Greenplum分布式数据库监控的实现

作者:互联网

一、前言

Greenplum是面向数据仓库应用的分布式关系型MPP数据库,基于PostgreSQL开发,跟PostgreSQL的兼容性非常好,大部分PostgreSQL客户端工具及PostgreSQL应用都能运行在Greenplum平台上。GPCC是Greenplum数据库官方商业版的数据库监控软件,对于只能用得起开源的用户来说,只能考虑其他的监控方案了。本文里介绍一种基于Promethues与Grafana的Greenplum分布式数据库监控的实现方案。

二、Promethues与Grafana简介

2.1、Prometheus简介

Prometheus是由SoundCloud开发的开源监控报警系统和时序列数据库(TSDB),使用Go语言开发。Prometheus目前在开源社区相当活跃。Prometheus性能也足够支撑上万台规模的集群。其架构图如下:

2.2、Grafana简介

Grafana是一个跨平台的开源的度量分析和可视化工具,可以通过将采集的数据查询然后可视化的展示,并及时通知。它主要有以下六大特点:

三、Greenplum监控的实现

Greenplum的监控可类似于PostgreSQL来实现,但又存在差异,不同点在于:

3.1、Greenplum的Exporter指标采集器

这里类比PostgreSQL数据库的Exporter实现方法,实现了一个Greenplum的Exporter,项目地址为:

https://github.com/tangyibo/greenplum_exporter

在greenplum_expoter里主要扩展了实现了客户连接信息、账号连接信息、Segment存储信息、集群节点同步状态、数据库锁监控等相关指标,具体指标如下:

No.指标名称类型标签组指标描述数据源获取方法
1 greenplum_cluster_state Gauge

version;

master(master主机名);

standby(standby主机名)

gp 可达状态 ?:1→ 可用;0→ 不可用

SELECT count(*) from gp_dist_random('gp_id');

select version();

SELECT hostname from p_segment_configuration

where content=-1 and role='p';

2 greenplum_cluster_uptime Gauge - 启动持续的时间 select extract(epoch from now() - pg_postmaster_start_time());
3 greenplum_cluster_sync Gauge - Master同步Standby状态? 1→ 正常;0→ 异常

SELECT count(*) from pg_stat_replication

where state='streaming'

4 greenplum_cluster_max_connections Gauge - 最大连接个数

show max_connections;

show superuser_reserved_connections;

5 greenplum_cluster_total_connections Gauge - 当前连接个数

select count(*) total, count(*) filter(where current_query='') idle,

count(*) filter(where current_query<>'') active, count(*) filter(where current_query<>'' and not waiting) running, count(*) filter(where current_query<>'' and waiting) waiting from pg_stat_activity where procpid <> pg_backend_pid();

6 greenplum_cluster_idle_connections Gauge - idle连接数 同上
7 greenplum_cluster_active_connections Gauge - active query 同上
8 greenplum_cluster_running_connections Gauge - query executing 同上
9 greenplum_cluster_waiting_connections Gauge - query waiting execute 同上
10 greenplum_node_segment_status Gauge

hostname; address; dbid; content;

preferred_role; port; replication_port

segment的状态status: 1(U)→ up; 0(D)→ down select * from gp_segment_configuration;
11 greenplum_node_segment_role Gauge

hostname; address; dbid; content;

preferred_role; port; replication_port

segment的role角色: 1(P)→ primary; 2(M)→ mirror 同上
12 greenplum_node_segment_mode Gauge

hostname; address; dbid; content;

preferred_role; port; replication_port

segment的mode:1(S)→ Synced; 2(R)→ Resyncing; 3(C)→ Change Tracking; 4(N)→ Not Syncing 同上
13 greenplum_node_segment_disk_free_mb_size Gauge hostname segment主机磁盘空间剩余大小(MB) SELECT dfhostname as segment_hostname,sum(dfspace)/count(dfspace)/(1024*1024) as segment_disk_free_gb from gp_toolkit.gp_disk_free GROUP BY dfhostname
14 greenplum_cluster_total_connections_per_client Gauge client 每个客户端的total连接数 select usename, count() total, count() filter(where current_query='') idle, count(*) filter(where current_query<>'') active from pg_stat_activity group by 1;
15 greenplum_cluster_idle_connections_per_client Gauge client 每个客户端的idle连接数 同上
16 greenplum_cluster_active_connections_per_client Gauge client 每个客户端的active连接数 同上
17 greenplum_cluster_total_online_user_count Gauge - 在线账号数 同上
18 greenplum_cluster_total_client_count Gauge - 当前所有连接的客户端个数 同上
19 greenplum_cluster_total_connections_per_user Gauge usename 每个账号的total连接数 select client_addr, count() total, count() filter(where current_query='') idle, count(*) filter(where current_query<>'') active from pg_stat_activity group by 1;
20 greenplum_cluster_idle_connections_per_user Gauge usename 每个账号的idle连接数 同上
21 greenplum_cluster_active_connections_per_user Gauge usename 每个账号的active连接数 同上
22 greenplum_cluster_config_last_load_time_seconds Gauge - 系统配置加载时间 SELECT pg_conf_load_time()
23 greenplum_node_database_name_mb_size Gauge dbname 每个数据库占用的存储空间大小 SELECT dfhostname as segment_hostname,sum(dfspace)/count(dfspace)/(1024*1024) as segment_disk_free_gb from gp_toolkit.gp_disk_free GROUP BY dfhostname
24 greenplum_node_database_table_total_count Gauge dbname 每个数据库内表的总数量 SELECT count(*) as total from information_schema.tables where table_schema not in ('gp_toolkit','information_schema','pg_catalog');
25 greenplum_exporter_total_scraped Counter - - -
26 greenplum_exporter_total_error Counter - - -
27 greenplum_exporter_scrape_duration_second Gauge - - -
28 greenplum_server_users_name_list Gauge - 用户总数 SELECT usename from pg_catalog.pg_user;
29 greenplum_server_users_total_count Gauge - 用户明细 同上
30 greenplum_server_locks_table_detail Gauge

pid;datname;usename;

locktype;mode;

application_name;state;

lock_satus;query

锁信息 SELECT * from pg_locks
31 greenplum_server_database_hit_cache_percent_rate Gauge - 缓存命中率 select sum(blks_hit)/(sum(blks_read)+sum(blks_hit))*100 from pg_stat_database;
32 greenplum_server_database_transition_commit_percent_rate Gauge - 事务提交

3.2、使用Grafana绘制一个可视化状态图

根据以上监测指标,即可使用Grafana配置图像了,具体内容请见:

https://github.com/tangyibo/greenplum_exporter/releases/download/1.0/greenplum_dashboard.json

 

文章来源转自:https://blog.csdn.net/inrgihc/article/details/108686638

标签:count,greenplum,Grafana,cluster,Promethues,Gauge,total,segment,Greenplum
来源: https://www.cnblogs.com/inrgihc/p/14122806.html