Metric for exceeding limits #2112
-
Is there a specific Karpenter metric I can use to monitor if a node pool's limit is exceeded? I would like to set up a Prometheus alert manager rule to monitor it. If not, what is the error message that displays in the Karpenter logs to look out for? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I've managed to test this myself and found the relevant message in the logs.
Our Karpenter pod logs are exported to CloudWatch using FluentBit so we're able to add a metric filter on our log group with a relevant alarm Here's the Terraform code I used, if it's helpful for anyone. I've configured a SNS topic to send our alerts to, which creates them on our OpsGenie platform.
|
Beta Was this translation helpful? Give feedback.
I've managed to test this myself and found the relevant message in the logs.
all available instance types exceed limits for nodepool:
Our Karpenter pod logs are exported to CloudWatch using FluentBit so we're able to add a metric filter on our log group with a relevant alarm
Here's the Terraform code I used, if it's helpful for anyone. I've configured a SNS topic to send our alerts to, which creates them on our OpsGenie platform.