Monitoring the metal-stack

Overview

Monitoring Stack

Logging

Logs are being collected by Promtail and pushed to a Loki instance running in the control plane. Loki is deployed in monolithic mode and with storage type 'filesystem'. You can find all logging related configuration parameters for the control plane in the control plane's logging role.

In the partitions, Promtail is deployed inside a systemd-managed Docker container. Configuration parameters can be found in the partition's promtail role. Which hosts Promtail collects from can be configured via the prometheus_promtail_targets variable.

Monitoring

For monitoring we deploy the kube-prometheus-stack and a Thanos instance in the control plane. Metrics for the control plane are supplied by

  • metal-metrics-exporter
  • rethindb-exporter
  • event-exporter
  • gardener-metrics-exporter

To query and visualize logs, metrics and alerts we deploy several grafana dashboards to the control plane:

  • grafana-dashboard-alertmanager
  • grafana-dashboard-machine-capacity
  • grafana-dashboard-metal-api
  • grafana-dashboard-rethinkdb
  • grafana-dashboard-sonic-exporter

and also some gardener related dashboards:

  • grafana-dashboard-gardener-overview
  • grafana-dashboard-shoot-cluster
  • grafana-dashboard-shoot-customizations
  • grafana-dashboard-shoot-details
  • grafana-dashboard-shoot-states

The following ServiceMonitors are also deployed:

  • gardener-metrics-exporter
  • ipam-db
  • masterdata-api
  • masterdata-db
  • metal-api
  • metal-db
  • rethinkdb-exporter
  • metal-metrics-exporter

All monitoring related configuration parameters for the control plane can be found in the control plane's monitoring role.

Partition metrics are supplied by

  • node-exporter
  • blackbox-exporter
  • ipmi-exporter
  • sonic-exporter
  • metal-core
  • frr-exporter

and scraped by Prometheus. For each of these exporters, the target hosts can be defined by

  • prometheus_node_exporter_targets
  • prometheus_blackbox_exporter_targets
  • prometheus_frr_exporter_targets
  • prometheus_sonic_exporter_targets
  • prometheus_metal_core_targets
  • prometheus_frr_exporter_targets

Alerting

In addition to Grafana, alerts can optionally be sent to a Slack channel. For this to work, at least a valid monitoring_slack_api_url and a monitoring_slack_notification_channel must be specified. For further configuration parameters refer to the monitoring role. Alerting rules are defined in the rules directory of the partition's prometheus role.