其他分享
首页 > 其他分享> > 使用 TimescaleDB + Prometheus + Grafana 实现指标数据采集、存储及可视化

使用 TimescaleDB + Prometheus + Grafana 实现指标数据采集、存储及可视化

作者:互联网

目录

0. Background

这是关于如何使用TimescaleDBPrometheusGrafana来存储和分析长期指标数据的教程。在这一部分中,您将学习如何使用指标数据来回答有关基础架构的性能问题。

本教程将通过直接使用监视PostgreSQL数据库下载包含大量Prometheus指标的样本数据集,而不设置所有必需的基础结构,来完成实验。使用此样本数据集,将自动为您创建所需的架构。

在本教程中,您将学习如何 针对Prometheus指标优化TimescaleDB并构建可视化帮助您使用指标来回答常见问题,例如:

  1. How many active connections are there to the database?

  2. What is the maximum and average percent of memory usage?

  3. When is the database being read most often?

  4. How much disk space is being used?

  5. What is the cache hit rate?

Prometheus是一个开源且社区驱动的监视系统,一开始非常简单。Prometheus之所以令人赞叹,是其以简单直接的方式解决监控问题。他们的原则是做一件事,并且做好。这反映在例如PromQL语言的设计中。

但是,这种原则也存在限制性。值得赞扬的是,Prometheus的开发人员预见到他们的产品是opinionated,他们内置可扩展性以允许其他系统对其进行改进。反过来,Prometheus用户经常将其他系统视为增加其监视设置的一种方式。

这就是TimescaleDB和PostgreSQL出现的地方。特别是,Prometheus用户经常出于以下原因而转向TimescaleDB和PostgreSQL:

通过将Prometheus和TimescaleDB一起使用,可以将Prometheus的简单性与TimescaleDB的可靠性,功能和灵活性相结合,并选择最适合手头任务的方法。例如,您可以使用PromQL或完整SQL进行查询,或同时使用两者。

1. 设置Timescale pg_prometheuspostgresql适配器

1. 1 安装 pg_prometheus

pg_prometheus项目地址:https://github.com/timescale/pg_prometheus

百度网盘链接:https://pan.baidu.com/s/1FHSqtiFIU_afhcIGIpjozw
提取码:hwva,可将文件解压至安装目录,取代 git clone

git clone https://github.com/timescale/pg_prometheus
cd pg_prometheus
# git checkout 0.2.2
make
sudo make install

若 sudo pg_config 提示找不到命令。这是因为 sudo 在切换成 root 用户的时候,env 并不会去保留这些环境变量,需要特别的指明才可以。向 /etc/sudoers 文件中 env_reset 下增加一行。

Defaults        env_keep+="PATH PYTHONPATH LD_LIBRARY_PATH MANPATH JAVA_HOME"

另外,secure_path 变量里面也要加上 /usr/local/pgsql/bin 路径。

Defaults        secure_path="/usr/local/pgsql/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"

验证一下 pg_prometheus 扩展有没有装好。我们连接 TimescaleDB 命令:

$ psql -U postgres
psql (11.8)
Type "help" for help.

postgres=# SELECT name, default_version,installed_version FROM pg_available_extensions WHERE name LIKE '%prometheus%';

应该看到类似于以下输出的内容:

     name      | default_version | installed_version 
---------------+-----------------+-------------------
 pg_prometheus | 0.2.2           | 
(1 row)

1.2 创建Prometheus数据库

$ psql -U postgres
    
CREATE USER prometheus WITH LOGIN PASSWORD 'secret';    
CREATE DATABASE prometheus OWNER prometheus;

\connect prometheus;

CREATE EXTENSION timescaledb; 
CREATE EXTENSION pg_prometheus;

Optionally grant permissions to the database user (prometheus) that will own the Prometheus data:

-- Create the role
-- CREATE ROLE prometheus WITH LOGIN PASSWORD 'secret';  

GRANT ALL PRIVILEGES ON DATABASE prometheus TO prometheus;

创建表格。

-- 创建表,只使用pg_prometheus
-- SELECT create_prometheus_table('metrics');

-- 让pg_prometheus 使用 TimescaleDB 扩展。
SELECT create_prometheus_table('metrics',use_timescaledb=>true);

-- 配置数据保留,默认情况下,数据存储90天,然后删除
-- SELECT set_default_retention_period(180 * INTERVAL '1 day');

查看一下创建的表格。

# \dt
             List of relations
 Schema |      Name      | Type  |  Owner   
--------+----------------+-------+----------
 public | metrics_copy   | table | postgres
 public | metrics_labels | table | postgres
 public | metrics_values | table | postgres
(3 rows)

# \d metrics_values
                     Table "public.metrics_values"
  Column   |           Type           | Collation | Nullable | Default 
-----------+--------------------------+-----------+----------+---------
 time      | timestamp with time zone |           | not null | 
 value     | double precision         |           |          | 
 labels_id | integer                  |           |          | 

Indexes:
    "metrics_values_labels_id_idx" btree (labels_id, "time" DESC)
    "metrics_values_time_idx" btree ("time" DESC)
Triggers:
    ts_insert_blocker BEFORE INSERT ON metrics_values FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker()
    
# \d metrics_labels
                               Table "public.metrics_labels"
   Column    |  Type   | Collation | Nullable |                  Default        
-------------+---------+-----------+----------+--------------------------------------------
 id          | integer |           | not null | nextval('metrics_labels_id_seq'::regclass)
 metric_name | text    |           | not null | 
 labels      | jsonb   |           |          | 
Indexes:
    "metrics_labels_pkey" PRIMARY KEY, btree (id)
    "metrics_labels_metric_name_labels_key" UNIQUE CONSTRAINT, btree (metric_nam
e, labels)
    "metrics_labels_labels_idx" gin (labels)
    "metrics_labels_metric_name_idx" btree (metric_name)

# \d metrics_copy
              Table "public.metrics_copy"
 Column |    Type     | Collation | Nullable | Default 
--------+-------------+-----------+----------+---------
 sample | prom_sample |           | not null | 
Triggers:
    insert_trigger BEFORE INSERT ON metrics_copy FOR EACH ROW EXECUTE PROCEDURE prometheus.insert_view_normal('metrics_values', 'metrics_labels')

1.3 安装 prometheus-postgresql-adapter

prometheus-postgresql-adapter项目地址:https://github.com/timescale/prometheus-postgresql-adapter

百度网盘链接:https://pan.baidu.com/s/1NXpA-cxx7VeFh6lfXW-F8A
提取码:50gd

# install dependency
sudo apt-get install golang-go

git clone https://github.com/timescale/prometheus-postgresql-adapter
cd prometheus-postgresql-adapter
# git checkout 0.6.0

make ## 用go编译遇到各种问题

# 尝试用prebuilt
wget https://github.com/timescale/prometheus-postgresql-adapter/releases/download/v0.6.0/prometheus-postgresql-adapter-0.6.0-linux-amd64.tar.gz
tar xf prometheus-postgresql-adapter-0.6.0-linux-amd64.tar.gz

本地启动

./prometheus-postgresql-adapter -pg-port 5432 --pg-user postgres

执行返回结果。

{"caller":"log.go:31","config":"\u0026{remoteTimeout:30000000000 listenAddr::9201 telemetryPath:/metrics pgPrometheusConfig:{host:localhost port:5432 user:postgres password: database:postgres schema: sslMode:disable table:metrics copyTable: maxOpenConns:50 maxIdleConns:10 pgPrometheusNormalize:true pgPrometheusLogSamples:false pgPrometheusChunkInterval:43200000000000 useTimescaleDb:true dbConnectRetries:0 readOnly:false} logLevel:debug haGroupLockID:0 restElection:false prometheusTimeout:-1 electionInterval:5000000000}","level":"info","ts":"2020-06-22T03:39:09.775Z"}
{"caller":"log.go:31","level":"info","msg":"host=localhost port=5432 user=postgres dbname=postgres password='' sslmode=disable connect_timeout=10","ts":"2020-06-22T03:39:09.775Z"}
{"caller":"log.go:31","level":"info","msg":"Initialized pg_prometheus extension","ts":"2020-06-22T03:39:10.023Z"}
{"caller":"log.go:35","level":"warn","msg":"No adapter leader election. Group lock id is not set. Possible duplicate write load if running adapter in high-availability mode","ts":"2020-06-22T03:39:10.025Z"}
{"caller":"log.go:31","level":"info","msg":"Starting up...","ts":"2020-06-22T03:39:10.025Z"}
{"addr":":9201","caller":"log.go:31","level":"info","msg":"Listening","ts":"2020-06-22T03:39:10.025Z"}

1.4 导入数据

数据集百度网盘链接:https://pan.baidu.com/s/1i-w-PjuK5Qi7kda6duGLjg
提取码:7hk4

/P4/datasets/prom_data_csvs$ psql -U postgres -d prometheus

\COPY metrics_labels FROM metrics_labels_202003201742.csv DELIMITER ',' CSV HEADER;

\COPY metrics_values FROM metrics_values_202003201743.csv DELIMITER ',' CSV HEADER;

遇到错误,因为开始的时候忘了加DELIMITER ',' CSV HEADER等CSV文件的描述信息。

ERROR:  invalid input syntax for integer: ""id","metric_name","labels""
CONTEXT:  COPY metrics_labels, line 1, column id: ""id","metric_name","labels""

1.5 理解Prometheus的监控数据模型

Prometheus的监控数据模型有三大核心概念。参考:https://prometheus.io/docs/concepts/data_model/

具体例子

metrics_values_202003201743.csv文件里面外键引用了labels_id。time域是时间戳,value是测量数据,lables_id指明了这个测量数据的含义。

"time","value","labels_id"
2020-03-18 07:59:57,0,1
2020-03-18 07:59:47,0,1
2020-03-18 07:59:37,0,1
2020-03-18 07:59:27,0,1
2020-03-18 07:59:17,0,1
2020-03-18 07:59:07,0,1
2020-03-18 07:58:57,0,1
2020-03-18 07:58:47,0,1
2020-03-18 07:58:37,0,1

metrics_labels_202003201742.csv文件提供了各种labels的定义。id是编号,metric_name是可读的测度名称,labels是metric详细定义的测度标签。这里labels是jsonb格式数据。

"id","metric_name","labels"
1,cpu_usage_guest,"{""cpu"": ""cpu-total"", ""job"": ""prometheus"", ""host"": ""tsdb-avthar-taxi-tutorial-9"", ""cloud"": ""timescale-aws-us-west-2"", ""project"": ""support-50d0"", ""service"": ""tsdb-avthar-taxi-tutorial"", ""instance"": ""tsdb-avthar-taxi-tutorial-support-50d0.a.timescaledb.io:9273"", ""service_type"": ""pg""}"
2,cpu_usage_guest,"{""cpu"": ""cpu0"", ""job"": ""prometheus"", ""host"": ""tsdb-avthar-taxi-tutorial-9"", ""cloud"": ""timescale-aws-us-west-2"", ""project"": ""support-50d0"", ""service"": ""tsdb-avthar-taxi-tutorial"", ""instance"": ""tsdb-avthar-taxi-tutorial-support-50d0.a.timescaledb.io:9273"", ""service_type"": ""pg""}"

把这两行记录转义的""格式转换后。可以较清晰的看清labels,如何用json格式表达的键值对。

{cpu: cpu-total, job: prometheus, host: tsdb-avthar-taxi-tutorial-9, cloud: timescale-aws-us-west-2, project: support-50d0, service: tsdb-avthar-taxi-tutorial, instance: tsdb-avthar-taxi-tutorial-support-50d0.a.timescaledb.io:9273, service_type: pg}
{cpu: cpu0, job: prometheus, host: tsdb-avthar-taxi-tutorial-9, cloud: timescale-aws-us-west-2, project: support-50d0, service: tsdb-avthar-taxi-tutorial, instance: tsdb-avthar-taxi-tutorial-support-50d0.a.timescaledb.io:9273, service_type: pg}

从PostgreSQL 9.3开始,json就成了postgres里的一种数据类型,也就是和varchar、int一样,我们表里的一个字段的类型可以为json了。与此同时,postgres还提供了jsonb格式,jsonb格式是json的二进制形式,二者的区别在于json写入快,读取慢,jsonb写入慢,读取快,但在操作上,二者是没有区别的。下面几种数据类型都是可变长的。

数据类型 长度限制
char(n) 定长,长度限制n,长度不足用空白补充
varchar(n) 变长,长度限制n
text 变长,无长度限制
jsonb 变长,长度限制268435455个字节

Metric Names

关于测度名称和测度标签的命名规范,参考文档 METRIC AND LABEL NAMING

A metric name...

As a rule of thumb, either the sum() or the avg() over all dimensions of a given metric should be meaningful (though not necessarily useful). If it is not meaningful, split the data up into multiple metrics. For example, having the capacity of various queues in one metric is good, while mixing the capacity of a queue with the current number of elements in the queue is not.

Labels

Use labels to differentiate the characteristics of the thing that is being measured:

Do not put the label names in the metric name, as this introduces redundancy and will cause confusion if the respective labels are aggregated away.

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

Base units

Prometheus does not have any units hard coded. For better compatibility, base units should be used. The following lists some metrics families with their base unit. The list is not exhaustive.

Family Base unit Remark
Time seconds
Temperature celsius celsius is preferred over kelvin for practical reasons. kelvin is acceptable as a base unit in special cases like color temperature or where temperature has to be absolute.
Length meters
Bytes bytes
Bits bytes To avoid confusion combining different metrics, always use bytes, even where bits appear more common.
Percent ratio Values are 0–1 (rather than 0–100). ratio is only used as a suffix for names like disk_usage_ratio. The usual metric name follows the pattern A_per_B.
Voltage volts
Electric current amperes
Energy joules
Mass grams grams is preferred over kilograms to avoid issues with the kilo prefix.

2. 设置 Grafana

2.1 安装 Grafana

可通过 Grafana安装手册 来安装。安装完成后启动服务器。

2.2 将 Grafana 连接到您的 TimescaleDB 实例

接下来,您需要配置Grafana以连接到您的TimescaleDB实例。
首次登陆

  1. 打开我们的Web浏览器,然后转到 http://localhost:3000/ 。如果未配置其他端口,3000则是Grafana侦听的默认HTTP端口。
  2. 在登录页面上,默认用户名和密码均为admin。
  3. 可按照需求更改登陆密码。

首先选择“添加数据源”,然后在SQL组中选择“ PostgreSQL”选项:

在设置界面,设置各项参数,我们的参数是:

3. 针对Prometheus指标优化TimescaleDB

有几种方法可以优化您的TimescaleDB,以最大化存储,查询时间和总体成本效率:

  1. 使用压缩来节省存储空间
  2. 使用数据保留策略并决定要保留多少度量数据
  3. 使用连续聚合来减少对数据库的查询频率

3.1 压缩

对于本教程,我们能够使用压缩节省70%的空间。通过压缩,您可以使用更少的磁盘空间将原始的Prometheus指标保持更长的时间。

使用压缩很容易。首先,确定要压缩数据的时间间隔(例如6小时后),然后在metrics_values超级表上添加自动压缩策略 ,因为随着收集更多样本,该表将增长。为此,请在中运行以下命令psql:

--- Add compression policy on metrics_values -- compress old data after every 6 hours
ALTER TABLE metrics_values SET (
 timescaledb.compress,
 timescaledb.compress_segmentby = 'labels_id'
);

SELECT add_compress_chunks_policy('metrics_values', INTERVAL '6 hours');

3.2 数据保留

您可以使用数据保留来确定要存储长期指标的时间。

数据保留策略功能强大,因为它们使您可以保持缩减采样的指标,可以在其中保留数据的汇总,以进行长期分析,但可以丢弃构成这些汇总的基础数据以节省存储空间。

例如,如果每天要生成大量指标数据,则可能只希望将原始数据保留一定的时间(例如15天或30天),然后无限期地保留这些数据的汇总。使用将数据保留2天左右的方案,可以通过以下方法启用它:

--Adding a data retention policy to drop chunks that only consist of data older than 2 days old
-- available on Timescale Cloud and in Timescale Community v 1.7+
SELECT add_drop_chunks_policy('metrics_values', INTERVAL '2 days', cascade_to_materializations=>FALSE);

要检查策略是否成功创建以及有关该策略的信息,请运行:

SELECT * FROM timescaledb_information.drop_chunks_policies;

3.3 连续聚合

我们在本教程中使用连续聚合,以便在后台计算常见的聚合查询,以便您可以在需要信息时节省处理时间。

要将连续聚合与数据保留策略一起使用,您的聚合间隔可以小于或等于数据保留策略间隔。也就是说,如果您的数据保留策略间隔为DRI,而汇总间隔为AI,则AI <= DRI。

要创建一个五分钟(5m)的连续汇总,以汇总最大,平均和最小指标,根据我们上面设置的数据保留策略将基础数据删除:

-- 5 minute rollups of metrics
-- refresh interval same as time bucket interval
CREATE VIEW metrics_5mins
WITH (timescaledb.continuous,
   timescaledb.ignore_invalidation_older_than='2d',
   timescaledb.refresh_lag = '-30m',
   timescaledb.refresh_interval = '5m')
AS
   SELECT time_bucket('5 minutes', time) as bucket,
       labels_id,
       avg(value) as avg,
       max(value) as max,
       min(value) as min
       FROM
           metrics_values
       GROUP BY bucket, labels_id;

在该WITH语句中,我们指定:

要检查是否成功创建了聚集以及有关聚集的信息,请运行:

SELECT view_name, refresh_lag, refresh_interval, max_interval_per_job, materialization_hypertable
FROM timescaledb_information.continuous_aggregates;

要创建指标的每小时汇总,请运行:

--1 hour agg
CREATE VIEW metrics_hourly
WITH (timescaledb.continuous,
   timescaledb.ignore_invalidation_older_than='2d',
   timescaledb.refresh_lag = '-30m',
   timescaledb.refresh_interval = '1h')
AS
   SELECT time_bucket('1 hour', time) as bucket,
       labels_id,
       avg(value) as avg,
       max(value) as max,
       min(value) as min
       FROM
           metrics_values
       GROUP BY bucket, labels_id;

注意刷新间隔如何根据time_bucket语句部分中设置的聚合间隔而变化。

通过这些优化,我们准备深入研究并开始使用我们的数据来回答常见问题。

4. 在Grafana中可视化Prometheus数据

首先,我们来确定idpostgresql_pg_stat_activity_conn_count相对应的metrics:

SELECT * 
FROM metrics_labels
WHERE metric_name LIKE 'postgresql_pg_stat_activity_conn_count';

结果将如下所示:

id   |              metric_name               |      labels             
-----+----------------------------------------+------------------
 296 | postgresql_pg_stat_activity_conn_count | { labels here }
 297 | postgresql_pg_stat_activity_conn_count | { labels here }

我们的Prometheus数据来自两个不同的数据库,一个默认数据库和一个测试数据库。我们可以看到,id所需指标的为296(对应于我们的默认数据库)和297(对应于我们的测试数据库)。因此,我们现在可以通过运行以下两个SQL查询(每个数据库一个)来获取指标的所有值:

SELECT
  time,
  avg(value) AS "defaultdb connections"
FROM metrics_values
WHERE
  labels_id = 296
GROUP BY 1
ORDER BY 1;

SELECT
  time,
  avg(value) AS "defaultdb connections"
FROM metrics_values
WHERE
  labels_id = 297
GROUP BY 1
ORDER BY 1;

以上第一个查询的结果应如下所示:

           time            | defaultdb connections 
----------------------------+-----------------------
 2020-03-16 04:11:27.936+00 |                     2
 2020-03-16 04:11:37.937+00 |                     2
 2020-03-16 04:11:47.937+00 |                     2
 ...

metric_labels表可能会有所不同,这取决于您所安装的Prometheus。因此,最好不要用label_id来查询,而是JOIN对两个表进行查询,如下所示:

SELECT
  time,
  avg(value) AS "defaultdb connections"
FROM metrics_values
JOIN metrics_labels ON metrics_values.labels_id = metrics_labels.id
WHERE
  metrics_labels.metric_name LIKE 'postgresql_pg_stat_activity_conn_count'
GROUP BY 1
ORDER BY 1;

在Grafana中对该查询进行可视化
请先确保登录了 grafana,并连接成功数据库。

左上角 + Create Dashboard

选择 Add new panel 进入 Edit Panel 界面,grafana 不同版本该页面布局会有所不同,内容基本一致。

我们将直接编辑查询,而不是使用 Grafana query builder。在视图中,单击底部的“Edit SQL”按钮。

在开始编写查询之前,我们还想将查询数据库设置为我们先前连接的PostgreSQL数据源:

输入以下 SQL 查询

SELECT
  $__timeGroupAlias("time", 1m),
  avg(value) AS "defaultdb connections"
FROM metrics_values
JOIN metrics_labels ON metrics_values.labels_id = metrics_labels.id
WHERE
  $__timeFilter("time") AND
  metrics_labels.metric_name LIKE 'postgresql_pg_stat_activity_conn_count'
GROUP BY 1
ORDER BY 1;

可视化结果应类似于下图:

由于虚机性能限制,无法同时展示本教程开头提的五个指标的可视化视图,所以仅展示了第一个指标,How many active connections are there to the database?的可视化视图,其余的指标可视化参照 tutorial ,在 psql库中创建相应VIEW,然后在 grafana 中创建可视化文件,即可完成。

标签:TimescaleDB,labels,prometheus,Grafana,metrics,Prometheus,values,id
来源: https://www.cnblogs.com/cscshi/p/16582818.html