Skip to content

Improve efficiency of handling gocd alerts #370

Closed
@robrap

Description

@robrap

It takes a while to understand what is wrong in GoCD. Following up on an alert requires VPN and finding failure log in GoCD, and then knowing how to search for the actual failure, depending on where it failed. It would be great if the alerts had more context.

AC:

Timeboxed effort -- 1 day.

  • Some useful extract of the logs shows up in the Opsgenie alert (so that we can tell if it's a known/unknown issue, etc.)

Questions/Notes:

  • This work would only help in situations where you don't have to go on GoCD to re-run a stage anyhow (e.g. self-closing alerts).
  • We want to switch to ArgoCD & Kubernetes relatively soon; are there quick improvements to get more context in alerts, or are there improvements that would carry over?
  • Could we get the error details into the alert so VPN and GoCD login isn’t required?
  • The Runbook has some notes that can be referenced (or added to) for searching to find errors in logs of various stages.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Done - Long Term Storage

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions