Chi Wai Chan e632c03547
Add alert rules for missing metrics
Metrics can be missing due to cache expiration or connectivity issue
between the exporter and OpenStack APIs. Also see issue [1]

[1]: https://github.com/canonical/openstack-exporter-operator/issues/130

Change-Id: I081b7d6f704953ef3360ab6bbed20dcb2c59ec9f
Signed-off-by: Chi Wai Chan <chiwai.chan@canonical.com>
2025-03-10 17:25:25 +08:00

29 lines
936 B
YAML

groups:
- name: OpenStackServices
rules:
- alert: OpenStackServicesDown
expr: |
sum by(service) (
label_replace({__name__=~"openstack_(.+)_up"}, "service", "$1", "__name__", "openstack_(.+)_up")
) == 0
for: 5m
labels:
severity: critical
service: "{{ $labels.service }}"
annotations:
summary: OpenStack Services Down
description: |
The OpenStack service {{ $labels.service }} is down
- name: OpenStackMetrics
rules:
- alert: OpenStackMetricsMissing
expr: |
absent_over_time({__name__=~"openstack_(.+)_up"}[5m])
labels:
severity: critical
annotations:
summary: OpenStack Metrics Missing
description: |
All OpenStack metrics are missing for over 5 minutes. This could be due to the
connectivity issue of the OpenStack APIs, or the cache of the metrics has expired.