Skip to content

[new-codebundle-request] - Azure Data Bricks Health #39

@stewartshea

Description

@stewartshea

What cloud platform(s) should this support?

Azure

What are some key tasks that should be performed?

Workspace availability check – Query Azure Resource Health for the Databricks workspace (and optionally the regional Databricks status page) to confirm the service is “Healthy” and no outages are open.

Cluster state audit – Use the Clusters API (GET /api/2.0/clusters/list) to ensure every production cluster is in a valid terminal state (RUNNING, RESIZING, TERMINATED) and none are stuck in ERROR or PENDING beyond a timeout.

Recent job-run success rate – Enumerate scheduled jobs with GET /api/2.1/jobs/list, pull the last N runs for each (/runs/list), and alert if any run’s result_state ≠ SUCCESS.

DBFS I/O sanity test – From a lightweight cluster, write "ok" to /tmp/healthcheck, read it back, and delete it; fail if any operation errors or latency exceeds a defined SLO.

SQL Warehouse readiness – Call GET /api/2.0/sql/warehouses, verify each required warehouse is RUNNING, then issue a SELECT 1 through its JDBC URL and measure response time.

Secret scope / Key Vault connectivity – List secret scopes (GET /api/2.0/secrets/scopes/list), attempt a get on a known test secret in each, and flag RESOURCE_DOES_NOT_EXIST or auth failures to detect Key Vault or permission issues.

Any other helpful context?

No response

Contact

None

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

In Review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions