viva
  • 主页
  • 开发
    • IDE环境
      • Vim
      • VSCode
        • code-server
        • theia
    • Golang手册
    • Ansible
    • SaltStack
      • salt-api
    • Python
      • Q&A
      • Hello,world!
      • 类型
        • 数字
        • 字符串
        • 列表
        • 元组
        • 集合
        • 字典
      • 语法
        • 流程控制
        • 循环
      • 工具
        • pip
        • pyenv
        • selenium
      • 示例
    • Shell
    • Vue
      • ant-design
      • vue-material
    • Java
  • 随笔
    • WSL2
    • Git
    • Markdown
      • mermaid
    • Linux
      • 时钟同步
      • 账号
      • 文件
      • SSH
      • systemd
      • TLS加密
        • Openssl
      • 存储
        • rsync
        • NFS
      • 路由
      • 日志采集
        • Journalctl
        • Fluentd
          • 根据字段匹配对日志进行结构化
        • Logstash
    • 虚拟化
      • Hyper-V
      • PXE
    • 命令行工具
      • zsh
    • Nexus3
    • 堡垒机
      • JumpServer
    • 测试工具
      • ioping
      • apache bench
      • dd
      • postman
    • 开源项目管理
      • Github
    • 软件包管理
    • 网络镜像源
    • 网关
      • Nginx
  • 存储
    • Ceph
      • 安装
      • 故障处理
    • Etcd
    • Mysql
  • 网络
    • Overlay网络
    • IPVS
    • 网络工具
      • tshark
      • tcpdump
    • 防火墙
      • nftables
      • firewalld
      • iptables
    • 域名解析
      • dig
      • nslookup
      • dnsmasq
    • 代理
      • Clash
      • Redsocks
      • Redsocks2
      • COW
      • Proxfier
  • 云原生
    • 容器
      • Docker
        • Dockerfile
        • docker-compose
      • Podman
      • 原理
        • Chroot
        • Namespace
        • Cgroup
    • 镜像仓库
      • Harbor
    • Kubernetes
      • 部署
        • Kind
        • Minikube
        • kubespray
      • CNI
        • Flannel
        • OVS
        • Calico
      • Operator
        • OperatorSDK
      • StorageClass
      • Q&A
    • Openshift
      • 集群部署
      • 快速使用
    • Prometheus
      • prometheus-operator
      • kube-prometheus
      • Thanos
        • 组件
          • Bucket
          • Check
          • Compact
          • Querier
          • Receiver
          • Ruler
          • Sidecar
          • Store
        • 参考资料
      • Cortex
        • 组件
          • Alertmanager
          • Config
          • Distributor
          • Ingester
          • Querier
          • Query Frontend
          • Ruler
        • 参考资料
      • Thanos与Cortex方案对比
      • 参考资料
      • Q&A
    • Skywalking
    • Rook
    • Helm
    • Istio
    • 应用部署
      • Nexus
      • RabbitMQ
    • 参考资料
  • DevOps
    • 开源DevOps平台
    • CICD
      • Jenkins
    • 参考资料
  • 机器学习
    • Kubeflow
      • 示例
由 GitBook 提供支持
在本页
  • 可用参数
  • 部分响应
  • 高可用
  • 告警对接
  • 查询对接
  1. 云原生
  2. Prometheus
  3. Thanos
  4. 组件

Ruler

rule组件用于评估prometheus的recording和alerting规则。其本身不会抓取metrics接口数据,而是通过query API从query组件定期地获取指标数据,如果配置了多个query地址,则会采用轮询方式获取。

其中recording规则评估生成地数据会以prometheus 2.0格式保存在本地,并且定期地扫描本地生成的TSDB数据块上传到对象存储桶中做为历史数据长期保存。同时也实现了StoreAPI可用于查询本地保存的数据。

与prometheus节点类似,每个rule节点都使用独立的存储,可以同时运行多个副本,而且需要为每个副本实例分配不同的标签以作区分,因为store组件在查询对象存储中的历史数据时是以该标签进行分组查询。

可用参数

usage: thanos rule [<flags>]

ruler evaluating Prometheus rules against given Query nodes, exposing Store API and storing old blocks in bucket

Flags:
  -h, --help                     Show context-sensitive help (also try --help-long and --help-man).
      --version                  Show application version.
      --log.level=info           Log filtering level.
      --log.format=logfmt        Log format to use.
      --tracing.config-file=<file-path>
                                 Path to YAML file with tracing configuration. See format details: https://thanos.io/tracing.md/#configuration
      --tracing.config=<content>
                                 Alternative to 'tracing.config-file' flag (lower priority). Content of YAML file with tracing configuration. See format details:
                                 https://thanos.io/tracing.md/#configuration
      --http-address="0.0.0.0:10902"
                                 Listen host:port for HTTP endpoints.
      --http-grace-period=2m     Time to wait after an interrupt received for HTTP Server.
      --grpc-address="0.0.0.0:10901"
                                 Listen ip:port address for gRPC endpoints (StoreAPI). Make sure this address is routable from other components.
      --grpc-grace-period=2m     Time to wait after an interrupt received for GRPC Server.
      --grpc-server-tls-cert=""  TLS Certificate for gRPC server, leave blank to disable TLS
      --grpc-server-tls-key=""   TLS Key for the gRPC server, leave blank to disable TLS
      --grpc-server-tls-client-ca=""
                                 TLS CA to verify clients against. If no client CA is specified, there is no client verification on server side. (tls.NoClientCert)
      --label=<name>="<value>" ...
                                 Labels to be applied to all generated metrics (repeated). Similar to external labels for Prometheus, used to identify ruler and its blocks as unique source.
      --data-dir="data/"         data directory
      --rule-file=rules/ ...     Rule files that should be used by rule manager. Can be in glob format (repeated).
      --resend-delay=1m          Minimum amount of time to wait before resending an alert to Alertmanager.
      --eval-interval=30s        The default evaluation interval to use.
      --tsdb.block-duration=2h   Block duration for TSDB block.
      --tsdb.retention=48h       Block retention time on local disk.
      --alertmanagers.url=ALERTMANAGERS.URL ...
                                 Alertmanager replica URLs to push firing alerts. Ruler claims success if push to at least one alertmanager from discovered succeeds. The scheme should not be empty e.g
                                 `http` might be used. The scheme may be prefixed with 'dns+' or 'dnssrv+' to detect Alertmanager IPs through respective DNS lookups. The port defaults to 9093 or the
                                 SRV record's value. The URL path is used as a prefix for the regular Alertmanager API path.
      --alertmanagers.send-timeout=10s
                                 Timeout for sending alerts to Alertmanager
      --alertmanagers.config-file=<file-path>
                                 Path to YAML file that contains alerting configuration. See format details: https://thanos.io/components/rule.md/#configuration. If defined, it takes precedence over
                                 the '--alertmanagers.url' and '--alertmanagers.send-timeout' flags.
      --alertmanagers.config=<content>
                                 Alternative to 'alertmanagers.config-file' flag (lower priority). Content of YAML file that contains alerting configuration. See format details:
                                 https://thanos.io/components/rule.md/#configuration. If defined, it takes precedence over the '--alertmanagers.url' and '--alertmanagers.send-timeout' flags.
      --alertmanagers.sd-dns-interval=30s
                                 Interval between DNS resolutions of Alertmanager hosts.
      --alert.query-url=ALERT.QUERY-URL
                                 The external Thanos Query URL that would be set in all alerts 'Source' field
      --alert.label-drop=ALERT.LABEL-DROP ...
                                 Labels by name to drop before sending to alertmanager. This allows alert to be deduplicated on replica label (repeated). Similar Prometheus alert relabelling
      --web.route-prefix=""      Prefix for API and UI endpoints. This allows thanos UI to be served on a sub-path. This option is analogous to --web.route-prefix of Promethus.
      --web.external-prefix=""   Static prefix for all HTML links and redirect URLs in the UI query web interface. Actual endpoints are still served on / or the web.route-prefix. This allows thanos UI
                                 to be served behind a reverse proxy that strips a URL sub-path.
      --web.prefix-header=""     Name of HTTP request header used for dynamic prefixing of UI links and redirects. This option is ignored if web.external-prefix argument is set. Security risk: enable
                                 this option only if a reverse proxy in front of thanos is resetting the header. The --web.prefix-header=X-Forwarded-Prefix option can be useful, for example, if Thanos
                                 UI is served via Traefik reverse proxy with PathPrefixStrip option enabled, which sends the stripped prefix value in X-Forwarded-Prefix header. This allows thanos UI
                                 to be served on a sub-path.
      --objstore.config-file=<file-path>
                                 Path to YAML file that contains object store configuration. See format details: https://thanos.io/storage.md/#configuration
      --objstore.config=<content>
                                 Alternative to 'objstore.config-file' flag (lower priority). Content of YAML file that contains object store configuration. See format details:
                                 https://thanos.io/storage.md/#configuration
      --query=<query> ...        Addresses of statically configured query API servers (repeatable). The scheme may be prefixed with 'dns+' or 'dnssrv+' to detect query API servers through respective
                                 DNS lookups.
      --query.sd-files=<path> ...
                                 Path to file that contain addresses of query peers. The path can be a glob pattern (repeatable).
      --query.sd-interval=5m     Refresh interval to re-read file SD files. (used as a fallback)
      --query.sd-dns-interval=30s
                                 Interval between DNS resolutions.

规则文件的配置格式与prometheus相同,其中interval参数默认与启动参数--eval-interval一致,也可单独指定,每个规则评估任务会以该时间做为周期向query组件发送查询请求获取评估所需的指标数据。

部分响应

rule对原有的prometheus规则配置做了拓展,在每条规则配置上新加了字段partial_response_strategy用以指定向query发起查询时的部分响应策略。可选值有warn和abort(默认值)。

warn表示忽略所有的StoreAPI响应异常并返回结果,在控制台打印警告日志。

abort表示当有StoreAPI响应异常时直接终止查询并在控制台打印错误日志。

例子:

groups:
- name: "warn strategy"
  partial_response_strategy: "warn"
  rules:
  - alert: "some"
    expr: "up"
- name: "abort strategy"
  partial_response_strategy: "abort"
  rules:
  - alert: "some"
    expr: "up"
- name: "by default strategy is abort"
  rules:
  - alert: "some"
    expr: "up"

高可用

rule组件支持高可用部署,每个副本实例都会使用独立的存储空间和特定的外部标签,为了避免发送到alertmanager的告警产生重复告警,需要在启动时附加参数--alert.label-drop="{{ 外部标签键 }}"以消除告警通知中的不同标签。alertmanager在判断告警通知时判断标签是相同的会认定为是同一个告警,不会重复发送。

告警对接

可通过启动参数 --alertmanagers.config 或--alertmanagers.config-file指定对接alertmanager的配置,格式如下:

alertmanagers:
- http_config:
    basic_auth:
      username: ""
      password: ""
      password_file: ""
    bearer_token: ""
    bearer_token_file: ""
    proxy_url: ""
    tls_config:
      ca_file: ""
      cert_file: ""
      key_file: ""
      server_name: ""
      insecure_skip_verify: false
  static_configs: []
  file_sd_configs:
  - files: []
    refresh_interval: 0s
  scheme: http
  path_prefix: ""
  timeout: 10s
  api_version: v1

其中api_version支持v1和v2。可以配置多个alermanager接入地址,rule将所有接入地址认定为一个接收组,按配置顺序向其中的一个发起通知,只有在所有接入地址都报错的情况下才会认定为告警通知发送失败。

查询对接

可通过启动参数 --query.config and --query.config-file指定对接query的配置,格式如下:

- http_config:
    basic_auth:
      username: ""
      password: ""
      password_file: ""
    bearer_token: ""
    bearer_token_file: ""
    proxy_url: ""
    tls_config:
      ca_file: ""
      cert_file: ""
      key_file: ""
      server_name: ""
      insecure_skip_verify: false
  static_configs: []
  file_sd_configs:
  - files: []
    refresh_interval: 0s
  scheme: http
  path_prefix: ""

与告警对接相同,query的接入地址也可配置多个,rule按配置的顺序向其中的一个发起评估查询,只有所有query都响应失败的情况下才会认定为评估查询发送失败。

rule组件获取评估数据的路径 rule --> query --> sidecar --> prometheus 需要经过一整个查询链条,这也提升了发生故障的风险,且评估原本就可以在prometheus中进行,因此,在非必要的情况下更加推荐使用prometheus做alerting和recording评估。

上一页Receiver下一页Sidecar

最后更新于5年前