
rule组件用于评估prometheus的recording和alerting规则。其本身不会抓取metrics接口数据,而是通过query API从query组件定期地获取指标数据,如果配置了多个query地址,则会采用轮询方式获取。

其中recording规则评估生成地数据会以prometheus 2.0格式保存在本地,并且定期地扫描本地生成的TSDB数据块上传到对象存储桶中做为历史数据长期保存。同时也实现了StoreAPI可用于查询本地保存的数据。



usage: thanos rule [<flags>]

ruler evaluating Prometheus rules against given Query nodes, exposing Store API and storing old blocks in bucket

  -h, --help                     Show context-sensitive help (also try --help-long and --help-man).
      --version                  Show application version.
      --log.level=info           Log filtering level.
      --log.format=logfmt        Log format to use.
                                 Path to YAML file with tracing configuration. See format details:
                                 Alternative to 'tracing.config-file' flag (lower priority). Content of YAML file with tracing configuration. See format details:
                                 Listen host:port for HTTP endpoints.
      --http-grace-period=2m     Time to wait after an interrupt received for HTTP Server.
                                 Listen ip:port address for gRPC endpoints (StoreAPI). Make sure this address is routable from other components.
      --grpc-grace-period=2m     Time to wait after an interrupt received for GRPC Server.
      --grpc-server-tls-cert=""  TLS Certificate for gRPC server, leave blank to disable TLS
      --grpc-server-tls-key=""   TLS Key for the gRPC server, leave blank to disable TLS
                                 TLS CA to verify clients against. If no client CA is specified, there is no client verification on server side. (tls.NoClientCert)
      --label=<name>="<value>" ...
                                 Labels to be applied to all generated metrics (repeated). Similar to external labels for Prometheus, used to identify ruler and its blocks as unique source.
      --data-dir="data/"         data directory
      --rule-file=rules/ ...     Rule files that should be used by rule manager. Can be in glob format (repeated).
      --resend-delay=1m          Minimum amount of time to wait before resending an alert to Alertmanager.
      --eval-interval=30s        The default evaluation interval to use.
      --tsdb.block-duration=2h   Block duration for TSDB block.
      --tsdb.retention=48h       Block retention time on local disk.
      --alertmanagers.url=ALERTMANAGERS.URL ...
                                 Alertmanager replica URLs to push firing alerts. Ruler claims success if push to at least one alertmanager from discovered succeeds. The scheme should not be empty e.g
                                 `http` might be used. The scheme may be prefixed with 'dns+' or 'dnssrv+' to detect Alertmanager IPs through respective DNS lookups. The port defaults to 9093 or the
                                 SRV record's value. The URL path is used as a prefix for the regular Alertmanager API path.
                                 Timeout for sending alerts to Alertmanager
                                 Path to YAML file that contains alerting configuration. See format details: If defined, it takes precedence over
                                 the '--alertmanagers.url' and '--alertmanagers.send-timeout' flags.
                                 Alternative to 'alertmanagers.config-file' flag (lower priority). Content of YAML file that contains alerting configuration. See format details:
                        If defined, it takes precedence over the '--alertmanagers.url' and '--alertmanagers.send-timeout' flags.
                                 Interval between DNS resolutions of Alertmanager hosts.
                                 The external Thanos Query URL that would be set in all alerts 'Source' field
      --alert.label-drop=ALERT.LABEL-DROP ...
                                 Labels by name to drop before sending to alertmanager. This allows alert to be deduplicated on replica label (repeated). Similar Prometheus alert relabelling
      --web.route-prefix=""      Prefix for API and UI endpoints. This allows thanos UI to be served on a sub-path. This option is analogous to --web.route-prefix of Promethus.
      --web.external-prefix=""   Static prefix for all HTML links and redirect URLs in the UI query web interface. Actual endpoints are still served on / or the web.route-prefix. This allows thanos UI
                                 to be served behind a reverse proxy that strips a URL sub-path.
      --web.prefix-header=""     Name of HTTP request header used for dynamic prefixing of UI links and redirects. This option is ignored if web.external-prefix argument is set. Security risk: enable
                                 this option only if a reverse proxy in front of thanos is resetting the header. The --web.prefix-header=X-Forwarded-Prefix option can be useful, for example, if Thanos
                                 UI is served via Traefik reverse proxy with PathPrefixStrip option enabled, which sends the stripped prefix value in X-Forwarded-Prefix header. This allows thanos UI
                                 to be served on a sub-path.
                                 Path to YAML file that contains object store configuration. See format details:
                                 Alternative to 'objstore.config-file' flag (lower priority). Content of YAML file that contains object store configuration. See format details:
      --query=<query> ...        Addresses of statically configured query API servers (repeatable). The scheme may be prefixed with 'dns+' or 'dnssrv+' to detect query API servers through respective
                                 DNS lookups.<path> ...
                                 Path to file that contain addresses of query peers. The path can be a glob pattern (repeatable).     Refresh interval to re-read file SD files. (used as a fallback)
                                 Interval between DNS resolutions.







- name: "warn strategy"
  partial_response_strategy: "warn"
  - alert: "some"
    expr: "up"
- name: "abort strategy"
  partial_response_strategy: "abort"
  - alert: "some"
    expr: "up"
- name: "by default strategy is abort"
  - alert: "some"
    expr: "up"


rule组件支持高可用部署,每个副本实例都会使用独立的存储空间和特定的外部标签,为了避免发送到alertmanager的告警产生重复告警,需要在启动时附加参数--alert.label-drop="{{ 外部标签键 }}"以消除告警通知中的不同标签。alertmanager在判断告警通知时判断标签是相同的会认定为是同一个告警,不会重复发送。


可通过启动参数 --alertmanagers.config--alertmanagers.config-file指定对接alertmanager的配置,格式如下:

- http_config:
      username: ""
      password: ""
      password_file: ""
    bearer_token: ""
    bearer_token_file: ""
    proxy_url: ""
      ca_file: ""
      cert_file: ""
      key_file: ""
      server_name: ""
      insecure_skip_verify: false
  static_configs: []
  - files: []
    refresh_interval: 0s
  scheme: http
  path_prefix: ""
  timeout: 10s
  api_version: v1



可通过启动参数 --query.config and --query.config-file指定对接query的配置,格式如下:

- http_config:
      username: ""
      password: ""
      password_file: ""
    bearer_token: ""
    bearer_token_file: ""
    proxy_url: ""
      ca_file: ""
      cert_file: ""
      key_file: ""
      server_name: ""
      insecure_skip_verify: false
  static_configs: []
  - files: []
    refresh_interval: 0s
  scheme: http
  path_prefix: ""


rule组件获取评估数据的路径 rule --> query --> sidecar --> prometheus 需要经过一整个查询链条,这也提升了发生故障的风险,且评估原本就可以在prometheus中进行,因此,在非必要的情况下更加推荐使用prometheus做alerting和recording评估。
