首页 > 编程语言> > Prometheus 服务发现之Kubernetes node模式（章节三）

Prometheus 服务发现之Kubernetes node模式（章节三）

2021-11-05 20:02:56 作者：互联网

前言

上次已经介绍过如何在prometheus下监控kubernetes的node了，但是上次介绍的是通过静态方式(static_configs)去完成，在测试环境下node节点数比较少的情况这种方式还是挺方便的。但是到了生产后，规模大了，node节点数多了，有个50，60台时，再使用这种静态的方式去做的话，效率很慢，也很繁琐。因此这一次主要通过prometheus的服务发现来解决node节点数多的情况下如何做到监控。

在kubernetes下的prometheus 通过与kubernetes API的对接，目前支持服务发现模式有5种，分别是Node，Service、Pod、Endpoints、Ingress。

以下我们主要从系统层面针对k8s集群中的节点进行监控。以下是环境简要
Kubernetes v1.19.13
Prometheus v2.30.3
node_exporter v1.2.2

一、部署prometheus

# 可参考文档：章节一文章

二、修改prometheus配置文件

修改configmap资源对象内容
添加新的任务。通过kubernetes_sd_configs的模式为node，prometheus会向kubernetes api接口发现并获取当前集群所有节点的信息，发现到的节点/metrics是kubelet的HTTP接口(10250)
# 官方文档。更多使用方法可参考官方文档：https://prometheus.io/docs/prometheus/latest/configuration/configuration/

[root@k8s-master ~]# vim prometheus_configmap.yaml

# 源YAML文件可到Github下载：https://github.com/shaxiaozz/prometheus

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-configmap
  namespace: monitor
data:
  prometheus.yml: |
    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:

    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
        - targets: ['127.0.0.1:9090']
      - job_name: 'service-k8s-nodes'
        kubernetes_sd_configs:
        - role: node

[root@pre01 prometheus]# kubectl apply -f prometh_configmap.yaml	#更新configmap
[root@pre01 prometheus]# curl -X POST http://10.244.219.114:9090/-/reload	#热加载配置文件

三，查看Targets

http://node_ip:nodeport/targets
在这里插入图片描述

可以看到，在service-k8s-node的job里面，已经自动发现了3个节点了，但是目前状态为down。这是因为prometheus发现到的node metrics接口是kubelet的http接口。因此我们需要把这一个接口变更为node_export的端口。把10250端口变更9100端口。

具体要怎么样变更修改呢。我们需要用到prometheus的relabel_configs中的replace能力了，relabel 可以在 Prometheus 采集数据之前，通过Target 实例的 Metadata 信息，动态重新写入 Label 的值。除此之外，我们还能根据 Target 实例的 Metadata 信息选择是否采集或者忽略该 Target 实例。
什么是Label，我们在Target界面就可以看到了，这一些都是这一个Target实例的Label
在这里插入图片描述

可以看到__address__标签的值就是host:port的，因此我们可以通过relabel_configs中的replace把这一个标签值种的端口替换为9100。

# 源YAML文件可到Github下载：https://github.com/shaxiaozz/prometheus

  - job_name: 'service-k8s-nodes'
    kubernetes_sd_configs:
    - role: node
    relabel_configs:
    - source_labels: [__address__]
      regex: '(.*):10250'
      replacement: '${1}:9100'
      target_label: __address__
      action: replace

relabel_configs字段下大概意思为：从源标签(address)做正则匹配，搜索端口为10250的，并把主机名保存为变量1的值，通过replacement对目标标签(address)进行替换与改写，把匹配到的内容传输给变量1，最后一个的动作为替换。

官方话语解释：
匹配regex连接的source_labels. 然后设置 target_label为replacement，并用它们的值替换匹配组引用 ( ${1}, ${2}, …) replacement。如果regex 不匹配，则不进行替换。

[root@pre01 prometheus]# kubectl apply -f  prometh_configmap.yaml
[root@pre01 prometheus]# curl -X POST http://10.244.219.114:9090/-/reload	#热加载配置文件

在这里插入图片描述

可以看到service-k8s-nodes任务下的实例端口都变成9100了，而且现在的监控状态也为UP了。这就是relabel_configs功能的强大，针对label做替换，做筛选。你也可以把匹配上某个字段的实例把他排除了，不进行监控。

更多kubernetes_sd_config的用法可查看官方文档：https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
更多relabel_configs的用法可查看官方文档：https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config

标签：node,Kubernetes,kubernetes,relabel,prometheus,Prometheus,configs,configuration
来源： https://blog.csdn.net/weixin_42708432/article/details/121153744