Prometheus Blackbox域名SSL证书监控并设置AlertManager告警
blackbox exporter 是prometheus社区提供的黑盒监控解决方案,运行用户通过HTTP、HTTPS、DNS、TCP以及ICMP的方式对网络进行探测(主动监测主机与服务状态)。
文章目录
Prometheus和Grafana安装以前已经写过很多次了,如果没有安装的小同学可以参考下面的文章安装 Docker版本
<img src="https://img.mryunwei.com/uploads/2023/05/20230504025841358.png" alt>
<img src="https://img.mryunwei.com/uploads/2023/05/20230504025841358.png" alt>
Prometheus 监控MySQL数据库
<img src="https://img.mryunwei.com/uploads/2023/05/20230504105847895.gif">新闻联播老司机
K8s版本
<img src="https://img.mryunwei.com/uploads/2023/05/20230504025841358.png" alt>
<img src="https://img.mryunwei.com/uploads/2023/05/20230504025841358.png" alt>
Prometheus Grafana使用Ceph持久化并监控k8s集群
<img src="https://img.mryunwei.com/uploads/2023/05/20230504105847895.gif">新闻联播老司机
blackbox exporter
blackbox exporter 是prometheus社区提供的黑盒监控解决方案,运行用户通过HTTP、HTTPS、DNS、TCP以及ICMP的方式对网络进行探测(主动监测主机与服务状态)。
安装Blackbox exporter
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.21.1/blackbox_exporter-0.21.1.linux-amd64.tar.gz tar zxvf blackbox_exporter-0.21.1.linux-amd64.tar.gz mkdir /usr/local/exporter mv blackbox_exporter-0.21.1.linux-amd64 /usr/local/exporter/blackbox_exporter #修改配置文件 cat >/usr/local/exporter/blackbox_exporter/blackbox.yml<刚刚检测启动没有问题,我们编辑启动脚本
cat >/usr/lib/systemd/system/blackbox_exporter.service<启动测试
# 启动 [root@abcdocker system]# systemctl restart blackbox_exporter 1. 查看状态 [root@abcdocker system]# systemctl status blackbox_exporter 1. 开机自启 [root@abcdocker system]# systemctl enable blackbox_exporter默认端口号9115
![]()
Docker安装 端口号映射9115 挂载本地/usr/local/exporter/blackbox_exporter blackbox.yml 在挂载目录,可自行修改 docker run --rm -d -p 9115:9115 --name blackbox_exporter -v /usr/local/exporter/blackbox_exporter:/config prom/blackbox-exporter:master --config.file=/config/blackbox.yml检查端口启动
[root@prometheus blackbox_exporter]# docker ps|grep black 8c5302d44971 prom/blackbox-exporter:master "/bin/blackbox_expor…" 52 seconds ago Up 51 seconds 0.0.0.0:9115->9115/tcp blackbox_exporter测试端口号
[root@prometheus blackbox_exporter]# curl 127.0.0.1:9115/metrics 1. HELP blackbox_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which blackbox_exporter was built. 1. TYPE blackbox_exporter_build_info gauge blackbox_exporter_build_info{branch="master",goversion="go1.16.10",revision="70bff7941301753b125a40bcf6b3ed28935a9a94",version="0.19.0"} 1 1. HELP blackbox_exporter_config_last_reload_success_timestamp_seconds Timestamp of the last successful configuration reload. 1. TYPE blackbox_exporter_config_last_reload_success_timestamp_seconds gauge blackbox_exporter_config_last_reload_success_timestamp_seconds 1.6562274758327048e+09 1. HELP blackbox_exporter_config_last_reload_successful Blackbox exporter config loaded successfully. ... ... ...Promethues 监控配置
Prometheus中配置--job 编辑Promethues配置文件
[root@prometheus ~]# cd /etc/prometheus/ [root@prometheus prometheus]# ls alertmanager prometheus.yml prometheus.yml_bak_2022-06-20 rules [root@prometheus prometheus]# vim prometheus.yml添加下面的job_name
- job_name: 'blackbox_http_2xx' metrics_path: /probe params: module: [http_2xx] #配置get请求检测 static_configs: - targets: - http://prometheus.io # Target to probe with http. - https://i4t.com # Target to probe with https. - https://ukx.cn - https://k.i4t.com - https://nas.frps.cn - https://esxi.frps.cn - https://rancher.frps.cn - https://jumpserver.frps.cn - https://frps.cn - https://imgkb.com - https://grafana.frps.cn - https://down.frps.cn - https://my.ukx.cn - https://linux.ukx.cn relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 10.0.24.13:9115 #blackbox地址和端口号 - job_name: 'blackbox_tcp_connect' # 检测某些端口是否在线 scrape_interval: 30s metrics_path: /probe params: module: [tcp_connect] static_configs: - targets: - dsm.frps.cn:9091 - dsm.frps.cn:1998 - dsm.frps.cn:1999 - apiserver.frps.cn:8443 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 10.0.24.13:9115 # blackbox-exporter 服务所在的机器和端口重启Prometheus
不建议使用127地址
![]()
Promethues Bloackbox参数解释 以下参数只是demo例子 1、ICMP 测试(主机探活) 可以通过 ping(icmp) 检测服务器的存活,在 blackbox.yml 配置文件中配置使用 icmp module:
modules: icmp: prober: icmpPrometheus job文件如下
- job_name: 'blackbox-ping' metrics_path: /probe params: modelus: [icmp] static_configs: - targets: - 172.16.106.208 #被监控端ip - 172.16.106.80 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: IP:9115 #blackbox-exporter 所在的机器和端口2、TCP 测试(监控主机端口存活状态)在 blackbox.yml配置文件中配置使用 tcp module:
modules: tcp_connect: prober: tcpPrometheus
- job_name: 'blackbox-tcp' metrics_path: /probe params: modelus: [tcp_connect] static_configs: - targets: - 172.16.106.208:6443 - 172.16.106.80:6443 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: IP:91153、HTTP检测(监控网站状态)http 探针是进行黑盒监控时最常用的探针之一,通过 http 探针能够网站或者 http 服务建立有效的监控,包括其本身的可用性,以及用户体验相关的如响应时间等等。除了能够在服务出现异常的时候及时报警,还能帮助运维同学分析和优化网站体验。 在 blackbox.yml配置文件中配置使用 http module:
modules: http_2xx: prober: http http: method: GET http_post_2xx: prober: http http: method: POSTPrometheus job
- job_name: 'blackbox-http' metrics_path: /probe params: modelue: [http_2xx] static_configs: - targets: - https://i4t.com relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: IP:9115 #blackbox-exporter 所在的机器和端口通过 prober 配置项指定探针类型。配置项 http 用于自定义探针的探测方式,这里有没对 http 配置项添加任何配置,表示完全使用 http 探针的默认配置,该探针将使用 http get 的方式对目标服务进行探测,并且验证返回状态码是否为 2xx,是则表示验证成功,否则失败。 采集数据如下
# DNS解析时间,单位 s probe_dns_lookup_time_seconds 0.000199105 1. 探测从开始到结束的时间,单位 s,请求这个页面响应时间 probe_duration_seconds 0.010889113 1. HELP probe_failed_due_to_regex Indicates if probe failed due to regex 1. TYPE probe_failed_due_to_regex gauge probe_failed_due_to_regex 0 1. HTTP 内容响应的长度 probe_http_content_length -1 1. 按照阶段统计每阶段的时间 probe_http_duration_seconds{phase="connect"} 0.001083728 #连接时间 probe_http_duration_seconds{phase="processing"} 0.008365885 #处理请求的时间 probe_http_duration_seconds{phase="resolve"} 0.000199105 #响应时间 probe_http_duration_seconds{phase="tls"} 0 #校验证书的时间 probe_http_duration_seconds{phase="transfer"} 0.000446424 #传输时间 1. 重定向的次数 probe_http_redirects 0 1. ssl 指示是否将 SSL 用于最终重定向 probe_http_ssl 0 1. 返回的状态码 probe_http_status_code 200 1. 未压缩的响应主体长度 probe_http_uncompressed_body_length 1766 1. http 协议的版本 probe_http_version 1.1 1. HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes. probe_ip_addr_hash 3.24030434e+09 1. 使用的 ip 协议的版本号 probe_ip_protocol 4 1. 是否探测成功 probe_success 1Grafana 配置
Grafana模板推荐
13230 SSL证书监控 ![]()
13659 HTTP状态监控 ![]()
9965 SSL TCP HTTP综合监控图标 ![]()
AlertManager
alertmanager告警配置如下
SSL证书小于30天发送告警 HTTP状态非200告警 alertmanager安装可以看下面文章,我这直接提供规则
<img src="https://img.mryunwei.com/uploads/2023/05/20230504105857956.png" alt> <img src="https://img.mryunwei.com/uploads/2023/05/20230504105857956.png" alt>
AlertManager 微信告警配置
<img src="https://img.mryunwei.com/uploads/2023/05/20230504105847895.gif">新闻联播老司机
22年2月14日 喜欢:2 浏览:2.3k alertmanager设置规则
[root@prometheus rules]# cat /etc/prometheus/rules/blackbox_exporter.yaml groups: - name: Blackbox 监控告警 rules: - alert: BlackboxSlowProbe expr: avg_over_time(probe_duration_seconds[1m]) > 1 for: 30m labels: severity: warning annotations: summary: telnet (instance {{ $labels.instance }}) 超时1秒 description: "VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: BlackboxProbeHttpFailure expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400 for: 30m labels: severity: critical annotations: summary: HTTP 状态码 (instance {{ $labels.instance }}) description: "HTTP status code is not 200-399n VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: BlackboxSslCertificateWillExpireSoon expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30 for: 30m labels: severity: warning annotations: summary: 域名证书即将过期 (instance {{ $labels.instance }}) description: "域名证书30天后过期n VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: BlackboxSslCertificateWillExpireSoon expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 7 for: 30m labels: severity: critical annotations: summary: 域名证书即将过期 (instance {{ $labels.instance }}) description: "域名证书7天后过期n VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: BlackboxSslCertificateExpired expr: probe_ssl_earliest_cert_expiry - time() <= 0 for: 30m labels: severity: critical annotations: summary: 域名证书已过期 (instance {{ $labels.instance }}) description: "域名证书已过期n VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: BlackboxProbeSlowHttp expr: avg_over_time(probe_http_duration_seconds[1m]) > 10 for: 30m labels: severity: warning annotations: summary: HTTP请求超时 (instance {{ $labels.instance }}) description: "HTTP请求超时超过10秒n VALUE = {{ $value }}n LABELS = {{ $labels }}"重启prometheus
docker restart prometheus_new此时Prometheus已经添加上,并且微信已经告警
![]()
![]()
相关文章:
- Prometheus 监控VMware_ESXI并配置AlertManager告警
- Prometheus Grafana使用Ceph持久化并监控k8s集群
- Prometheus监控Ceph集群并设置AlertManager告警
- AlertManager 微信告警配置