使用prometheusrules自定义创建告警规则

2020-11-09 k8s prometheus, prometheus-operator, prometheusrules 0 Comments Word Count: 681(words) Read Count: 3(minutes)

介绍

首先这篇文章是跟着上一篇helm 部署prometheus-operator来的，部署完成之后，我们就需要自定义一些配置。

这篇文章主要讲解如何自定义告警规则，如何让prometheus发现他。

步骤

添加prometheusrules规则
验证

名词解释

prometheusrules，也是安装好prometheus-operator后创建的一种自定义资源，我们可以看下默认自带了哪些规则：

[root@localhost]# kubectl get prometheusrules -n monitoring
NAME                                                              AGE
prometheus-operator-me-alertmanager.rules                     2d23h
prometheus-operator-me-etcd                                   2d23h
prometheus-operator-me-general.rules                          2d23h
prometheus-operator-me-k8s.rules                              2d23h
prometheus-operator-me-kube-apiserver-availability.rules      2d23h
prometheus-operator-me-kube-apiserver-slos                    2d23h
prometheus-operator-me-kube-apiserver.rules                   2d23h
prometheus-operator-me-kube-prometheus-general.rules          2d23h
prometheus-operator-me-kube-prometheus-node-recording.rules   2d23h
prometheus-operator-me-kube-scheduler.rules                   2d23h
prometheus-operator-me-kube-state-metrics                     2d23h
prometheus-operator-me-kubelet.rules                          2d23h
prometheus-operator-me-kubernetes-apps                        2d23h
prometheus-operator-me-kubernetes-resources                   2d23h
prometheus-operator-me-kubernetes-storage                     2d23h
prometheus-operator-me-kubernetes-system                      2d23h
prometheus-operator-me-kubernetes-system-apiserver            2d23h
prometheus-operator-me-kubernetes-system-controller-manager   2d23h
prometheus-operator-me-kubernetes-system-kubelet              2d23h
prometheus-operator-me-kubernetes-system-scheduler            2d23h
prometheus-operator-me-node-exporter                          2d23h
prometheus-operator-me-node-exporter.rules                    2d23h
prometheus-operator-me-node-network                           2d23h
prometheus-operator-me-node.rules                             2d23h
prometheus-operator-me-prometheus                             2d23h
prometheus-operator-me-prometheus-operator                    2d23h

当然这些规则，你也可以在prometheus的界面上看到，具体也就是对应一个一个的rules
png1

开始

①添加prometheusrules规则

创建自定义rules文件

[root@localhost]# cat demo1.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app: prometheus-operator
    release: eve-prometheus-operator
  name: testtalus-rules-1
  namespace: lb6
spec:
  groups:
  - name: testtalus.rules
    rules:
    - alert: processorNatGatewayMonitor_snat_to_hight_100
      expr: processorNatGatewayMonitor_snat > 100
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: "nat gateway {{ $labels.natgatewayid }} snat连接数过高"
        description: "nat gateway {{ $labels.natgatewayid  }} snat连接数大于100 (当前值：{{ $value }})"

具体的指标不解释了，这个文档一大堆，简单说下groups.name这个，就是一个组名，然后下面有很多很多的规则，比如当前processorNatGatewayMonitor_snat_to_hight_100就是testtalus.rules这个组里面的一个指标而已。

开始创建:

1 2	[root@localhost]# kubectl delete prometheusrules testtalus-rules-1 -n lb6 prometheusrule.monitoring.coreos.com "testtalus-rules-1" deleted

如果你这里报错，并且报错信息如下：

1
2

[root@localhost]# kubectl apply -f demo1.yaml
Error from server (InternalError): error when creating "demo1.yaml": Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post https://prometheus-operator-me-operator.meitu-monitoring.svc:443/admission-prometheusrules/mutate?timeout=30s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)

那么在这里找答案：跳转

我的解决方案是：删除资源validatingwebhookconfigurations.admissionregistration.k8s.io和MutatingWebhookConfiguration，并且重新创建你的rules

[root@localhost]# kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io
NAME                                   CREATED AT
prometheus-operator-me-admission   2020-11-06T10:47:12Z
[root@localhost]# kubectl get MutatingWebhookConfiguration
NAME                                                 CREATED AT
prometheus-operator-me-admission                 2020-11-06T10:47:12Z
pod-ready.config.common-webhooks.networking.gke.io   2020-02-25T13:52:06Z
[root@localhost]# kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io eve-prometheus-operator-me-admission
validatingwebhookconfiguration.admissionregistration.k8s.io "eve-prometheus-operator-me-admission" deleted
[root@localhost]# kubectl delete MutatingWebhookConfiguration eve-prometheus-operator-me-admission
mutatingwebhookconfiguration.admissionregistration.k8s.io "eve-prometheus-operator-me-admission" deleted

②验证

到prometheus的rules界面，你就可以看到你自定义的规则了
png2

本文链接： https://blog.itmonkey.icu/2020/11/09/prometheus-operator-prometheusrules/

版权声明： 本博客所有文章除特别声明外，均采用 CC BY 4.0 CN协议许可协议。转载请注明出处！

猿的野生香蕉SRE

一个在运维道路上狂飙的老司机