Adapter that translates Prometheus Alertmanager webhooks to the University of Washington AlertAPI (v2).
Runs as a small Quart web service. You can deploy it with systemd or as a container.
- Receives Alertmanager webhook payloads (POST /)
- Translates to AlertAPI v2 payload and forwards to AlertAPI
- Optional watchdog endpoint (POST /watchdog) to manage AlertAPI keepalives
- Exposes health and Prometheus metrics endpoints
- Python 3.10+ (for local/systemd) or a container runtime
- Network egress to the AlertAPI endpoint
- These environment variables (required unless noted):
- ALERTAPI_TOKEN: Bearer token for AlertAPI access
- ALERTAPI_URL: Base URL for AlertAPI (no trailing path)
- ALERT_ORGANIZATION: ServiceNow organization name
- LOG_LEVEL: Optional. DEBUG, INFO, WARNING, ERROR, CRITICAL (default INFO)
- PROMETHEUS_MULTIPROC_DIR: Required when running multiple workers (Dockerfile sets /tmp/metric-multi)
-
POST /
- Accepts a standard Alertmanager webhook JSON body.
- On success, forwards to AlertAPI /v2/alert and returns the upstream status (typically 202).
- Returns 406 if required labels/annotations are missing; 400 for malformed JSON; 5xx on upstream timeout/connect errors.
-
POST /watchdog
- Accepts the same payload, forwards to AlertAPI /v2/keepalive.
- Label watchdog_timeout (minutes) can override the default of 5.
-
GET /healthz → 200
-
GET /metrics → Prometheus metrics (am2alertapi_responses_total{api_endpoint,status_code}).
The translator builds an AlertAPI payload per alert using:
- labels:
- alertname → component.name (required)
- focus → maps to urgency when status == "firing". Mapping: 1→1, 2→1, 3→2, 4→3
- hostname | cluster | ci_name → sets ci.name (at least one strongly recommended)
- ci_sysid → sets ci.sysid (optional)
- kba → sets knowledge base article number (optional)
- watchdog_timeout → minutes for keepalive (optional; watchdog endpoint)
- annotations:
- summary → title (required)
- description → message prefix (required)
- generatorURL → appended to message as "source: …"
If required fields are missing, the service returns 406 with a helpful error message.
See example payloads in tests/ for quick testing:
tests/alertmanager-alert-firing.jsontests/alertmanager-alert-ok.jsontests/alertmanager-alert-watchdog.json
Use the helper script (creates a venv, installs deps, starts Hypercorn on :3080):
- Edit
run_locallyand set ALERTAPI_URL, ALERT_ORGANIZATION, and a valid ALERTAPI_TOKEN - Then source it:
source ./run_locallyNow try the health check and a sample POST from tests/:
curl -s -o /dev/null -w "%{http_code}\n" http://127.0.0.1:3080/healthz
curl -s -H "Content-Type: application/json" \
--data @tests/alertmanager-alert-firing.json \
-o /dev/null -w "%{http_code}\n" \
http://127.0.0.1:3080/This repo includes a Dockerfile that runs the app with Hypercorn (2 workers) and Prometheus multiprocess metrics configured.
Build locally (example with podman):
podman build -t am2alertapi:dev .Run locally (env-file example):
# create testing.env with required vars
# ALERT_ORGANIZATION=UW-IT
# ALERTAPI_URL=https://api.alerts-test.s.uw.edu
# ALERTAPI_TOKEN=YOURTOKEN
podman run --env-file=testing.env --network=host am2alertapi:devGoogle Cloud Build (examples):
gcloud --project <PROJECT> builds submit \
--tag gcr.io/<PROJECT>/am2alertapi:dev-$(date +"%Y%m%d%H%M") .
gcloud --project <PROJECT> builds submit \
--tag gcr.io/<PROJECT>/am2alertapi:rel-$(date +"%Y%m%d%H%M") .A convenience script run_container_locally shows a ready-to-run podman run example against an artifact registry image.
There are example unit and environment files in systemd-deployment/. The Standalone-deploy script shows a minimal flow to clone, create a venv, install requirements, and install the unit.
Files:
systemd-deployment/etc.systemd.system.am2alertapi.servicesystemd-deployment/etc.sysconfig.am2alertapi
Notes:
- The unit binds to 127.0.0.1:3080 by default and runs as nobody.
- Set ALERTAPI_TOKEN, ALERTAPI_URL, and ALERT_ORGANIZATION in
/etc/sysconfig/am2alertapi. - After copying files, run:
systemctl daemon-reload && systemctl enable --now am2alertapi.
- Logging via standard Python logging. Control with
LOG_LEVEL(default INFO). - Prometheus metrics:
GET /metricsincludes a Counter namedam2alertapi_responses_totallabeled by endpoint and status code. When running multiple workers, ensurePROMETHEUS_MULTIPROC_DIRis set (container does this for you).
- 400 on POST: request body isn’t valid JSON.
- 406 on POST: missing required labels/annotations; check
focus,alertname,summary,description, andgeneratorURL. - 5xx on POST: upstream AlertAPI timeout or connect error; verify
ALERTAPI_URLand network access. - Metrics empty when using multiple workers: check
PROMETHEUS_MULTIPROC_DIRis set and writable.
See LICENSE for details.