Description
Ironic is capable of exposing hardware metrics for prometheus. Things like overall health or specific readings can be exposed. IrSO needs to be able to configure Ironic to publish the metrics and install ironic-prometheus-exporter (IPE) alongside Ironic.
The way IPE works now is by reading JSON files from a shared directory where Ironic puts them. There is a desire to rework this architecture (e.g. my merging IPE into Ironic proper), but so far this is how it works. Ironic Image already supports IPE, but we're getting reports that the support is incomplete.
Ideally, I'd prefer to see a new CRD for IPE. Since IPE is tightly bundled with Ironic (i.e. will live in the same pod), it may not make a lot of practical sense though. So it's probably going to be a new configuration section in the Ironic CRD, for instance
spec:
networking:
prometheusExporterPort: 3306 # or whatever the good default is given host networking
prometheusExporter:
enabled: true
Over time, more options could be added, the ones I have in mind are
spec:
prometheusExporter:
enabled: true # enable the feature
disableDefaultRules: false # do not create the default prometheus rules
defaultCollectionInterval: 30 # how often to poll BMC's by default
The sticking point is HA support. The way prometheus is configured, it may not understand a service that is backed by several pods that provide different information. It's possible that we'll need something (a new deployment of a new script) that aggregates information from all Ironic instances.