-
Notifications
You must be signed in to change notification settings - Fork 0
Description
What cloud platform(s) should this support?
Azure
What are some key tasks that should be performed?
Workspace availability check – Query Azure Resource Health for the Databricks workspace (and optionally the regional Databricks status page) to confirm the service is “Healthy” and no outages are open.
Cluster state audit – Use the Clusters API (GET /api/2.0/clusters/list) to ensure every production cluster is in a valid terminal state (RUNNING, RESIZING, TERMINATED) and none are stuck in ERROR or PENDING beyond a timeout.
Recent job-run success rate – Enumerate scheduled jobs with GET /api/2.1/jobs/list, pull the last N runs for each (/runs/list), and alert if any run’s result_state ≠ SUCCESS.
DBFS I/O sanity test – From a lightweight cluster, write "ok" to /tmp/healthcheck, read it back, and delete it; fail if any operation errors or latency exceeds a defined SLO.
SQL Warehouse readiness – Call GET /api/2.0/sql/warehouses, verify each required warehouse is RUNNING, then issue a SELECT 1 through its JDBC URL and measure response time.
Secret scope / Key Vault connectivity – List secret scopes (GET /api/2.0/secrets/scopes/list), attempt a get on a known test secret in each, and flag RESOURCE_DOES_NOT_EXIST or auth failures to detect Key Vault or permission issues.
Any other helpful context?
No response
Contact
None
Metadata
Metadata
Assignees
Labels
Type
Projects
Status