-
Notifications
You must be signed in to change notification settings - Fork 1
Dev #228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Dev #228
Conversation
…aches-the-limit' into dev
…aches-the-limit' into dev
…om and falkordb rules
…ed-to-alerts add runbook links to alert descriptions in containeroom and falkordb …
WalkthroughA new ArgoCD ApplicationSet for deploying Robusta across clusters labeled as "app-plane" was introduced. VictoriaMetrics alert routing was updated to send alerts to Robusta. Prometheus alert rules were expanded with new memory usage alerts and runbook URLs were added to several existing alerts for improved troubleshooting references. Changes
Sequence Diagram(s)sequenceDiagram
participant ArgoCD
participant Cluster
participant Robusta
participant VictoriaMetrics
participant Prometheus
ArgoCD->>Cluster: Deploy Robusta via ApplicationSet (Helm)
VictoriaMetrics->>Robusta: Send alerts via webhook receiver
Prometheus->>VictoriaMetrics: Evaluate alert rules (including new memory RSS alerts)
Prometheus->>VictoriaMetrics: Evaluate alerts with runbook URLs
Suggested reviewers
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Tofu Plan Output - observability_stack_ctrl_plane_k8s
|
Tofu Plan Output - observability_stack_ctrl_plane_infra
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
argocd/app_plane/dev/robusta.yaml
(1 hunks)argocd/app_plane/dev/victoriametrics.yaml
(2 hunks)observability/rules/containermemoryrss.rules.yml
(1 hunks)observability/rules/containeroom.rules.yml
(1 hunks)observability/rules/falkordb.rules.yml
(4 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
observability/rules/containermemoryrss.rules.yml
[error] 49-49: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (3)
observability/rules/containeroom.rules.yml (1)
19-22
: Runbook URL addition looks goodThe extra
runbook_url
annotation follows the existing YAML style, the URL looks valid, and no other fields were touched.observability/rules/falkordb.rules.yml (1)
25-28
: Consistent runbook links – no issues spottedAll newly-added
runbook_url
annotations are syntactically correct, consistently quoted, and point to the expected repo path.Also applies to: 71-74, 108-110, 130-133
argocd/app_plane/dev/victoriametrics.yaml (1)
104-107
: Receiver stanza OK – double-check service DNSURL and
send_resolved: true
look correct. Just ensure the Service
robusta-runner.observability.svc.cluster.local
actually exists in every target cluster before the Alertmanager pod starts, otherwise alerts will be dropped.
- receiver: 'robusta' | ||
group_by: [ '...' ] | ||
group_wait: 1s | ||
group_interval: 1s | ||
matchers: | ||
- severity =~ ".*" | ||
repeat_interval: 4h | ||
continue: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
group_by: ['...']
is probably invalid Alertmanager syntax
'...'
is not a documented placeholder.
• If you want to group by all labels, use an empty list []
.
• If you want to keep the default root grouping and add more labels, list them explicitly (e.g. ['alertname','namespace','pod']
).
Mis-configuration here will make Alertmanager fail to start.
- group_by: [ '...' ]
+ # group by all labels (same behaviour as root route)
+ group_by: []
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- receiver: 'robusta' | |
group_by: [ '...' ] | |
group_wait: 1s | |
group_interval: 1s | |
matchers: | |
- severity =~ ".*" | |
repeat_interval: 4h | |
continue: true | |
- receiver: 'robusta' | |
# group by all labels (same behaviour as root route) | |
group_by: [] | |
group_wait: 1s | |
group_interval: 1s | |
matchers: | |
- severity =~ ".*" | |
repeat_interval: 4h | |
continue: true |
🤖 Prompt for AI Agents
In argocd/app_plane/dev/victoriametrics.yaml around lines 88 to 95, the group_by
field is set to ['...'], which is invalid Alertmanager syntax. Replace ['...']
with either an empty list [] to group by all labels or explicitly list the
labels you want to group by, such as ['alertname', 'namespace', 'pod'], to fix
the configuration and prevent Alertmanager startup failure.
- alert: ContainerMemoryHighRSSCritical | ||
expr: | | ||
( | ||
sum by (namespace, pod, container) (container_memory_rss{container!=""}) | ||
/ | ||
sum by (namespace, pod, container) (kube_pod_container_resource_limits{resource="memory"}) | ||
) * 100 > 90 | ||
for: 5m | ||
labels: | ||
severity: critical | ||
annotations: | ||
summary: "Container memory RSS usage is critical" | ||
description: "Container {{ $labels.container }} in pod {{ $labels.pod }} (namespace {{ $labels.namespace }}) is using {{ humanizePercentage $value }} of its memory limit (RSS)." | ||
runbook_url: "https://github.com/falkordb/runbooks/blob/main/alerts/ContainerMemoryHighRSSCritical.md" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Replicate the guard fix for the critical alert
Same divide-by-zero concern as the warning rule – apply the same filter.
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 49-49: no new line character at the end of file
(new-line-at-end-of-file)
🤖 Prompt for AI Agents
In observability/rules/containermemoryrss.rules.yml around lines 36 to 49, the
critical alert expression lacks a guard against division by zero, similar to the
warning rule. To fix this, add a filter to the denominator metric
kube_pod_container_resource_limits{resource="memory"} to exclude zero values
before performing the division, ensuring no divide-by-zero errors occur.
- alert: ContainerMemoryHighRSSWarning | ||
expr: | | ||
( | ||
sum by (namespace, pod, container) (container_memory_rss{container!=""}) | ||
/ | ||
sum by (namespace, pod, container) (kube_pod_container_resource_limits{resource="memory"}) | ||
) * 100 > 75 | ||
for: 5m | ||
labels: | ||
severity: warning | ||
annotations: | ||
summary: "Container memory RSS usage is high (warning)" | ||
description: "Container {{ $labels.container }} in pod {{ $labels.pod }} (namespace {{ $labels.namespace }}) is using {{ humanizePercentage $value }} of its memory limit (RSS)." | ||
runbook_url: "https://github.com/falkordb/runbooks/blob/main/alerts/ContainerMemoryHighRSSWarning.md" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible divide-by-zero & missing newline
- If a container has no memory limit, the denominator becomes 0 →
+Inf
, triggering noisy alerts.
Filter out zero limits or add a small guard clause. - YAML-lint flags “no new line at end of file”.
- sum by (namespace, pod, container) (kube_pod_container_resource_limits{resource="memory"})
+ sum by (namespace, pod, container) (kube_pod_container_resource_limits{resource="memory"} > 0)
Add a trailing newline to silence the linter.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- alert: ContainerMemoryHighRSSWarning | |
expr: | | |
( | |
sum by (namespace, pod, container) (container_memory_rss{container!=""}) | |
/ | |
sum by (namespace, pod, container) (kube_pod_container_resource_limits{resource="memory"}) | |
) * 100 > 75 | |
for: 5m | |
labels: | |
severity: warning | |
annotations: | |
summary: "Container memory RSS usage is high (warning)" | |
description: "Container {{ $labels.container }} in pod {{ $labels.pod }} (namespace {{ $labels.namespace }}) is using {{ humanizePercentage $value }} of its memory limit (RSS)." | |
runbook_url: "https://github.com/falkordb/runbooks/blob/main/alerts/ContainerMemoryHighRSSWarning.md" | |
- alert: ContainerMemoryHighRSSWarning | |
expr: | | |
( | |
sum by (namespace, pod, container) (container_memory_rss{container!=""}) | |
- sum by (namespace, pod, container) (kube_pod_container_resource_limits{resource="memory"}) | |
+ sum by (namespace, pod, container) (kube_pod_container_resource_limits{resource="memory"} > 0) | |
) * 100 > 75 | |
for: 5m | |
labels: | |
severity: warning | |
annotations: | |
summary: "Container memory RSS usage is high (warning)" | |
description: "Container {{ $labels.container }} in pod {{ $labels.pod }} (namespace {{ $labels.namespace }}) is using {{ humanizePercentage $value }} of its memory limit (RSS)." | |
runbook_url: "https://github.com/falkordb/runbooks/blob/main/alerts/ContainerMemoryHighRSSWarning.md" |
🤖 Prompt for AI Agents
In observability/rules/containermemoryrss.rules.yml around lines 21 to 35, the
alert expression can cause divide-by-zero errors if a container has no memory
limit, resulting in noisy +Inf alerts. Fix this by filtering out containers with
zero memory limits in the denominator expression or adding a small guard clause
to avoid division by zero. Also, add a trailing newline at the end of the file
to satisfy YAML linting requirements.
enableHolmesGPT: true | ||
enabledManagedConfiguration: true | ||
kubewatch: | ||
nodeSelector: | ||
node_pool: observability | ||
holmes: | ||
nodeSelector: | ||
node_pool: observability | ||
toolsets: | ||
prometheus/metrics: | ||
enabled: true | ||
config: | ||
prometheus_url: http://vmsingle-vm-victoria-metrics-k8s-stack.observability.svc.cluster.local:8429 | ||
additionalEnvVars: | ||
- name: ROBUSTA_AI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential values-schema typos will break Helm render
Robusta chart expects:
• enableManagedConfiguration
(not enabledManagedConfiguration
)
• additional_env_vars
(snake-case) under holmes
, not additionalEnvVars
Wrong keys are silently ignored → features won’t be enabled.
-enabledManagedConfiguration: true
+enableManagedConfiguration: true
@@
- additionalEnvVars:
+ additional_env_vars:
Please verify the chart’s values.yaml
for exact casing.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
enableHolmesGPT: true | |
enabledManagedConfiguration: true | |
kubewatch: | |
nodeSelector: | |
node_pool: observability | |
holmes: | |
nodeSelector: | |
node_pool: observability | |
toolsets: | |
prometheus/metrics: | |
enabled: true | |
config: | |
prometheus_url: http://vmsingle-vm-victoria-metrics-k8s-stack.observability.svc.cluster.local:8429 | |
additionalEnvVars: | |
- name: ROBUSTA_AI | |
enableHolmesGPT: true | |
enableManagedConfiguration: true | |
kubewatch: | |
nodeSelector: | |
node_pool: observability | |
holmes: | |
nodeSelector: | |
node_pool: observability | |
toolsets: | |
prometheus/metrics: | |
enabled: true | |
config: | |
prometheus_url: http://vmsingle-vm-victoria-metrics-k8s-stack.observability.svc.cluster.local:8429 | |
additional_env_vars: | |
- name: ROBUSTA_AI |
🤖 Prompt for AI Agents
In argocd/app_plane/dev/robusta.yaml around lines 67 to 81, the keys
`enabledManagedConfiguration` and `additionalEnvVars` are incorrectly cased and
will be ignored by Helm. Rename `enabledManagedConfiguration` to
`enableManagedConfiguration` and change `additionalEnvVars` under `holmes` to
`additional_env_vars` using snake-case to match the expected values-schema and
ensure these features are enabled.
Summary by CodeRabbit
New Features
Improvements