feat: add weekly cleanup workflow for stale CI schemas by devin-ai-integration[bot] · Pull Request #943 · elementary-data/dbt-data-reliability

devin-ai-integration · 2026-02-28T11:32:32Z

feat: add weekly cleanup workflow for stale CI schemas

Summary

Adds dbt macros and a weekly GitHub Actions workflow to automatically clean up timestamped CI schemas older than a configurable threshold (default: 24 hours). All cleanup macros live in the integration test project (not the production package).

Macro organisation

Folder / File	Purpose	Macros
`integration_tests/dbt_project/macros/schema_utils/`	Generic, adapter-dispatched schema utilities	`edr_list_schemas`, `edr_schema_exists`
`integration_tests/dbt_project/macros/ci_schemas_cleanup/`	CI-specific cleanup logic	`drop_stale_ci_schemas`, `parse_timestamp_from_ci_schema_name`, `test_drop_stale_ci_schemas` (+ `edr_create_schema` test helper)
`integration_tests/dbt_project/macros/clear_env.sql` (pre-existing)	Shared drop helper	`edr_drop_schema` — reused by the cleanup macro

drop_stale_ci_schemas(prefixes, max_age_hours=24):

prefixes is mandatory — the macro raises a compiler error if omitted or empty.
Uses edr_list_schemas to enumerate schemas via SQL queries (adapter-specific: information_schema.schemata filtered by catalog_name for most adapters, project-qualified INFORMATION_SCHEMA.SCHEMATA for BigQuery, SHOW DATABASES for ClickHouse). Direct SQL is used because adapter.list_schemas() is not available in run-operation context.
Parses the embedded YYMMDD_HHMMSS timestamp from schema names matching <prefix><YYMMDD_HHMMSS>_<branch>_<hash> using regex with named groups ((?P<yy>\\d{2})...), then constructs a datetime directly via datetime(2000 + yy, mm, dd, HH, MI, SS) (no strptime call).
Drops schemas older than max_age_hours via the pre-existing edr_drop_schema from clear_env.sql (default uses dbt.drop_schema; ClickHouse override uses DROP DATABASE IF EXISTS).
Reusable from the elementary repo (companion PR: feat: add daily cleanup workflow for stale CI schemas elementary#2128).

Integration test (integration_tests/tests/test_ci_cleanup.py):

Creates an old schema (dbt_200101_000000_citest_00000000 = year 2020) and a recent schema, runs cleanup with an 8760-hour threshold (1 year), asserts the old schema was dropped and the recent one was kept.
The large threshold prevents accidentally dropping real CI schemas from parallel test workers (see race condition note).
Uses edr_schema_exists for verification.
Skipped on Dremio (DremioRelation path model is incompatible with standard api.Relation.create).

Workflow (.github/workflows/cleanup-stale-schemas.yml):

Runs every Sunday at 03:00 UTC (and on workflow_dispatch with optional max-age-hours input).
Matrix over cloud warehouse targets: snowflake, bigquery, redshift, databricks_catalog, athena.
Generates profiles via existing generate_profiles.py, installs deps, and invokes the macro with dbt run-operation from the integration test project working directory.

Updates since last revision

Databricks naming conflict

During CI investigation, discovered that the original macro names (list_schemas_sql, schema_exists_sql, drop_schema_sql) were shadowing dbt-databricks internal macros with identical names. This caused Databricks tests to fail with PARSE_SYNTAX_ERROR where a Python [] list was sent as SQL instead of a SQL query string.

Fix: Renamed all macros to use the edr_ prefix (edr_list_schemas, edr_schema_exists, edr_drop_schema) to avoid conflicts. The edr_drop_schema macro already existed in clear_env.sql for test environment cleanup, so the duplicate drop_schema.sql file was removed and the cleanup logic now reuses the existing implementation.

SQL escaping improvements

Applied CodeRabbit suggestions for defense-in-depth SQL escaping:

Single-quote escaping for string parameters: database | replace("'", "''"), schema_name | replace("'", "''")
BigQuery backtick escaping for identifiers: database | replace("", "`")`

While database/schema names come from trusted dbt configuration, these changes harden the macros against edge cases.

Review & Testing Checklist for Human

Reused edr_drop_schema from clear_env.sql: This macro was originally written for test environment cleanup. Verify it behaves correctly for the weekly cleanup workflow. Note that the default implementation uses dbt.drop_schema(), which on some adapters (e.g. Databricks) does not use DROP SCHEMA IF EXISTS. If a schema is concurrently dropped between listing and dropping, the macro will error. Consider whether a single failure + retry is acceptable.
**BigQuery backtick escaping with \``**: The Jinja template uses database | replace("", "\")` to escape backticks. Verify Jinja doesn't double-escape the backslash (CI passed, but worth a sanity check).
Generic helpers dispatched under elementary_tests namespace: The schema_utils/ helpers use adapter.dispatch('...', 'elementary_tests'). Verify the companion elementary PR (feat: add daily cleanup workflow for stale CI schemas elementary#2128) can call these macros or has its own copies.
Untested at scale: The macro has been validated in CI integration tests but not against real production warehouses with hundreds of schemas. Consider a manual dry-run (set max_age_hours: 999999 to only log "keep" messages) before the first scheduled run.

Notes

ClickHouse dispatch exists but ClickHouse is not in the cleanup matrix (expected, since it's docker-only).
The _elementary suffixed schemas (e.g., dbt_260228_..._elementary) are correctly matched and cleaned up.
utcnow() is used (deprecated in Python 3.12+) because modules.datetime.timezone is not available in dbt's Jinja context. Both utcnow() and the constructed datetime are naive, so comparisons are safe.
Dremio is skipped in the integration test because DremioRelation requires a different path structure that's incompatible with standard api.Relation.create(database=..., schema=...) calls.
The only change to the production package is adding clickhouse__schema_exists to macros/utils/cross_db_utils/schema_exists.sql.

Race condition with parallel CI schemas

During CI development, the integration test initially used max_age_hours=1 and failed on Databricks with SCHEMA_NOT_FOUND. The test's cleanup macro listed ALL dbt_* schemas in the database, including leftover schemas from previous CI runs (e.g., dbt_260228_120808_..._gw4 created ~1.5 hours earlier). When the macro tried to drop these, they had already been cleaned up by their respective CI runs, causing the error.

The fix: increased the test threshold to max_age_hours=8760 (1 year) so only the artificially old test schema (year 2020 = 26 years old) exceeds the threshold, and real CI schemas (at most a few hours old) are safely ignored. This same race condition could occur in production if two cleanup workflows run concurrently, but the weekly schedule makes this unlikely.

Link to Devin run: https://app.devin.ai/sessions/7f7e889a382c4a8e90d83e9e2636eef3
Requested by: @haritamar

Summary by CodeRabbit

New Features
- Automated weekly cleanup of stale CI schemas across multiple warehouse types (Snowflake, BigQuery, Redshift, Databricks, Athena, ClickHouse).
- Configurable age threshold for determining stale schemas.
Chores
- Added a scheduled workflow to run cleanup jobs per warehouse type.
Tests
- Added integration tests and adapter-aware helpers to validate creation, detection, and removal of stale CI schemas.

Add a dbt macro (elementary.drop_stale_ci_schemas) with per-adapter dispatch that lists all schemas matching CI prefixes, parses the embedded YYMMDD_HHMMSS timestamp, and drops schemas older than a configurable threshold (default 24h). Add a GitHub Actions workflow that runs weekly (Sunday 03:00 UTC) against each cloud warehouse target. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration · 2026-02-28T11:32:34Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

github-actions · 2026-02-28T11:32:41Z

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

coderabbitai · 2026-02-28T11:32:48Z

📝 Walkthrough

Walkthrough

Adds a weekly/manual GitHub Actions workflow to drop stale CI schemas and introduces dbt macros and tests to list schemas, detect schema timestamps, check existence across adapters (including ClickHouse), and perform cleanup by age and prefix.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow `.github/workflows/cleanup-stale-schemas.yml`	Adds scheduled (weekly Sunday 03:00 UTC) and manual workflow that sets up Python/dbt, installs adapter packages, generates profiles, symlinks the local package, and runs a dbt operation to drop stale CI schemas with configurable max-age-hours across a matrix of warehouses.
Integration Test (Python) `integration_tests/tests/test_ci_cleanup.py`	Adds a pytest integration test that invokes the dbt operation `elementary_tests.test_drop_stale_ci_schemas`, parses JSON results, and asserts old/recent schema existence and drop/keep outcomes.
Cleanup Macros (integration_tests) `integration_tests/dbt_project/macros/ci_schemas_cleanup/drop_stale_ci_schemas.sql`, `.../parse_timestamp_from_ci_schema_name.sql`, `.../test_drop_stale_ci_schemas.sql`	Adds main cleanup macro `drop_stale_ci_schemas(prefixes, max_age_hours)`, timestamp-parsing macro for CI schema names, and an integration-test macro that creates old/recent test schemas, invokes cleanup, validates results, and includes adapter-specific schema-creation helpers (default and ClickHouse).
Schema Utility Macros (integration_tests) `integration_tests/dbt_project/macros/schema_utils/list_schemas.sql`, `.../schema_exists.sql`	Adds adapter-dispatched helpers `edr_list_schemas(database)` and `edr_schema_exists(database, schema_name)` with implementations for default (information_schema), BigQuery, and ClickHouse.
Cross-DB Utils `macros/utils/cross_db_utils/schema_exists.sql`	Adds ClickHouse-specific schema-existence macro (`clickhouse__schema_exists`) querying `system.databases` for compatibility with other cross-db checks.

Sequence Diagram(s)

sequenceDiagram
  participant GH as GitHub Actions
  participant CI as dbt runner (in workflow)
  participant PKG as local elementary package
  participant DB as Warehouse (Snowflake/BigQuery/Redshift/Databricks/Athena)

  GH->>CI: checkout repo & setup Python/dbt
  CI->>CI: install adapter packages & generate profiles
  CI->>PKG: symlink local package into test project
  CI->>CI: dbt deps install
  CI->>DB: dbt run-operation drop_stale_ci_schemas(prefixes, max_age_hours) [matrix warehouse]
  DB-->>CI: execute drops / return results
  CI-->>GH: action completes with operation output

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately and concisely summarizes the main change: adding a weekly cleanup workflow for stale CI schemas, which is the primary deliverable evident across all file additions.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch devin/1772278180-cleanup-stale-ci-schemas

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

macros/utils/ci_cleanup/drop_stale_ci_schemas.sql

Creates test schemas with old/recent timestamps, runs the cleanup macro, and verifies old schemas are dropped while recent ones are kept. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/utils/ci_cleanup/drop_stale_ci_schemas.sql`:
- Around line 45-46: The current call to
modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') can raise ValueError
for malformed dates; before calling strptime (inside the ns_valid.ok branch
where schema_ts is set), validate the numeric ranges of the parsed components
from ts_str (year/day/month/hour/minute/second substrings) — e.g., month 01-12,
day 01-31, hour 00-23, minute/second 00-59 — and only call
modules.datetime.datetime.strptime to create schema_ts when those checks pass;
if validation fails, skip that schema (do not call strptime) so malformed ts_str
values don’t abort the cleanup run.
- Line 23: The code sets now = modules.datetime.datetime.utcnow(), which uses a
deprecated naive UTC; change this to a timezone-aware UTC datetime by calling
modules.datetime.datetime.now(modules.datetime.timezone.utc) (or equivalent)
when assigning now in the macro, and remove any .replace(tzinfo=None) usage so
timezone info is preserved; also audit places that compare now to other
datetimes (e.g., functions referencing now) and update those comparisons to use
timezone-aware datetimes so no naive vs aware comparisons occur.

---

Nitpick comments:
In @.github/workflows/cleanup-stale-schemas.yml:
- Around line 42-46: The workflow step named "Install dbt" currently installs
"dbt-core" without a version, causing non-reproducible installs; update the run
command that installs "dbt-core" (the pip install line referencing "dbt-core"
and the matrix.warehouse-type expansion) to pin a specific dbt-core version
(e.g., "dbt-core==<desired_version>") and ensure any dynamically chosen adapter
package strings remain compatible with that pinned version.
- Around line 66-71: The workflow step calling the macro
elementary.drop_stale_ci_schemas passes --args with prefixes: ["dbt_"] while the
macro defaults to ['dbt_', 'py_']; update the workflow run arguments to include
both prefixes (e.g., prefixes: ["dbt_", "py_"]) so py_ schemas are also dropped,
or alternatively add a brief comment in the workflow explaining why only "dbt_"
is passed if that was intentional; target the run invocation that builds the
--args payload for elementary.drop_stale_ci_schemas.

In `@macros/utils/ci_cleanup/drop_stale_ci_schemas.sql`:
- Around line 77-80: The DROP DATABASE statement in macro
clickhouse__drop_ci_schema uses an unquoted schema_name; update the run_query
call to quote the identifier (e.g., wrap schema_name in backticks) or use the
adapter's identifier-quoting helper if available, so run_query("DROP DATABASE IF
EXISTS " ~ quoted_schema_name) is executed and then call adapter.commit() as
before; reference the macro clickhouse__drop_ci_schema, the schema_name
variable, the run_query invocation, and adapter.commit() when making the change.
- Around line 48-51: The loop currently calls elementary.drop_ci_schema directly
so any failure aborts the whole run; change it to attempt each drop inside a
guarded block (wrap the call to drop_ci_schema in a try/catch-equivalent Jinja
construct), only increment ns.dropped on success, and on failure increment a new
ns.failed (or ns.errors) counter and call log(...) with the schema_name and the
captured error details so the job continues cleaning other schemas; at the end
of the macro emit a summary using ns.dropped and ns.failed to report which drops
succeeded and which failed.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

📥 Commits

Reviewing files that changed from the base of the PR and between ae8cd38 and 4605b7f.

📒 Files selected for processing (2)

.github/workflows/cleanup-stale-schemas.yml
macros/utils/ci_cleanup/drop_stale_ci_schemas.sql

coderabbitai · 2026-02-28T11:38:32Z

macros/utils/ci_cleanup/drop_stale_ci_schemas.sql

+
+  {% set database = elementary.target_database() %}
+  {% set all_schemas = adapter.list_schemas(database) %}
+  {% set now = modules.datetime.datetime.utcnow() %}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Python 3.12 datetime.utcnow() deprecation replacement

💡 Result:

In Python 3.12, datetime.datetime.utcnow() is deprecated; use a timezone-aware UTC datetime instead. (docs.python.org)

Replacement

from datetime import datetime, UTC now_utc = datetime.now(UTC)

(docs.python.org)

If you need compatibility with older Python versions

from datetime import datetime, timezone now_utc = datetime.now(timezone.utc)

(docs.python.org)

Related: utcfromtimestamp() replacement

from datetime import datetime, UTC dt_utc = datetime.fromtimestamp(ts, tz=UTC)

(docs.python.org)

Citations:

1: https://docs.python.org/3/whatsnew/3.12.html?utm_source=openai

2: https://docs.python.org/3/whatsnew/3.12.html?utm_source=openai

3: https://docs.python.org/3.11/library/datetime.html?utm_source=openai

4: https://docs.python.org/3/whatsnew/3.12.html?utm_source=openai

datetime.utcnow() is deprecated in Python 3.12+.

Use datetime.now(timezone.utc) instead. The suggested fix should avoid stripping the timezone information with .replace(tzinfo=None), as this defeats the purpose of using timezone-aware datetimes. If the code elsewhere compares with naive datetimes, those should be updated to be timezone-aware as well.

♻️ Suggested fix

- {% set now = modules.datetime.datetime.utcnow() %} + {% set now = modules.datetime.datetime.now(modules.datetime.timezone.utc) %}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@macros/utils/ci_cleanup/drop_stale_ci_schemas.sql` at line 23, The code sets now = modules.datetime.datetime.utcnow(), which uses a deprecated naive UTC; change this to a timezone-aware UTC datetime by calling modules.datetime.datetime.now(modules.datetime.timezone.utc) (or equivalent) when assigning now in the macro, and remove any .replace(tzinfo=None) usage so timezone info is preserved; also audit places that compare now to other datetimes (e.g., functions referencing now) and update those comparisons to use timezone-aware datetimes so no naive vs aware comparisons occur.

coderabbitai · 2026-02-28T11:38:32Z

macros/utils/ci_cleanup/drop_stale_ci_schemas.sql

+          {% if ns_valid.ok %}
+            {% set schema_ts = modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') %}


⚠️ Potential issue | 🟡 Minor

strptime can raise ValueError for invalid date values.

The digit validation ensures characters are numeric but doesn't validate actual date ranges. For example, 260232_112345 (day 32) would pass digit validation but cause strptime to raise ValueError, aborting the entire cleanup run.

Consider wrapping in a try/except to skip malformed schemas gracefully:

🛡️ Suggested defensive handling

{% if ns_valid.ok %} - {% set schema_ts = modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') %} - {% set age_seconds = (now - schema_ts).total_seconds() %} - {% if age_seconds > max_age_seconds %} + {% set schema_ts = none %} + {% try %} + {% set schema_ts = modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') %} + {% except %} + {{ log(" skip " ~ schema_name ~ " (invalid timestamp: " ~ ts_str ~ ")", info=true) }} + {% endtry %} + {% if schema_ts is not none %} + {% set age_seconds = (now - schema_ts).total_seconds() %} + {% if age_seconds > max_age_seconds %}

Note: Jinja2 doesn't have native try/except. You may need to validate month (01-12) and day (01-31) ranges explicitly, or accept the current behavior given CI schemas are auto-generated with valid timestamps.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@macros/utils/ci_cleanup/drop_stale_ci_schemas.sql` around lines 45 - 46, The current call to modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') can raise ValueError for malformed dates; before calling strptime (inside the ns_valid.ok branch where schema_ts is set), validate the numeric ranges of the parsed components from ts_str (year/day/month/hour/minute/second substrings) — e.g., month 01-12, day 01-31, hour 00-23, minute/second 00-59 — and only call modules.datetime.datetime.strptime to create schema_ts when those checks pass; if validation fails, skip that schema (do not call strptime) so malformed ts_str values don’t abort the cleanup run.

coderabbitai

🧹 Nitpick comments (1)

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql`:
- Around line 39-43: The test unconditionally calls
elementary.drop_ci_schema(database, recent_schema) which can fail if the schema
was already removed; wrap that call in the same existence guard used above
(e.g., mirror the old_exists_after conditional or compute a recent_exists_after
before calling) so you only call drop_ci_schema(database, recent_schema) when
the recent schema actually exists; reference the existing symbols recent_schema,
old_exists_after, old_schema and the macro/function drop_ci_schema to locate
where to add the conditional guard.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

📥 Commits

Reviewing files that changed from the base of the PR and between 4605b7f and fbe4059.

📒 Files selected for processing (2)

integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql
integration_tests/tests/test_ci_cleanup.py

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql`:
- Around line 70-73: The macro clickhouse__edr_create_schema currently
concatenates schema_name into SQL and ignores the database param; instead
construct a dbt Relation via api.Relation.create(database=database,
schema=schema_name) (or at minimum use adapter.quote_identifier on both database
and schema_name) and use that quoted identifier in the run_query call, then call
adapter.commit(); update clickhouse__edr_create_schema to build the relation
with api.Relation.create(database=..., schema=...) and use the adapter’s quoting
helpers (adapter.quote_identifier or relation rendering) for the CREATE DATABASE
IF NOT EXISTS statement before committing.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

📥 Commits

Reviewing files that changed from the base of the PR and between fbe4059 and a124fbe.

📒 Files selected for processing (1)

integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql

coderabbitai · 2026-02-28T11:46:14Z

integration_tests/dbt_project/macros/ci_schemas_cleanup/test_drop_stale_ci_schemas.sql

+{% macro clickhouse__edr_create_schema(database, schema_name) %}
+  {% do run_query("CREATE DATABASE IF NOT EXISTS " ~ schema_name) %}
+  {% do adapter.commit() %}
+{% endmacro %}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate the file and read the specific lines find . -name "test_drop_stale_ci_schemas.sql" -type f | head -1

Repository: elementary-data/dbt-data-reliability

Length of output: 149

🏁 Script executed:

# Check the actual content around lines 70-73 if [ -f "integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql" ]; then cat -n "integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql" | sed -n '65,80p' fi

Repository: elementary-data/dbt-data-reliability

Length of output: 498

🏁 Script executed:

# Search for other ClickHouse macro implementations to understand patterns rg "clickhouse__" --type sql -A 3 | head -100

Repository: elementary-data/dbt-data-reliability

Length of output: 7839

🏁 Script executed:

# Check for adapter.quote usage in ClickHouse macros rg "adapter\.quote" --type sql -B 2 -A 2 | head -80

Repository: elementary-data/dbt-data-reliability

Length of output: 707

🏁 Script executed:

# Search for CREATE DATABASE usage in the codebase rg "CREATE DATABASE" --type sql -B 2 -A 2

Repository: elementary-data/dbt-data-reliability

Length of output: 597

🏁 Script executed:

# Find the default implementation of edr_create_schema rg "macro edr_create_schema" --type sql -A 5 | grep -v "clickhouse__" | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 807

🏁 Script executed:

# Check the database parameter usage in similar default macros rg "edr_drop_schema|edr_create_schema" --type sql -B 1 -A 4 | grep -E "(macro|database)" | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 2536

Quote the identifier and align with the interface contract.

The schema_name parameter is directly concatenated into the SQL query without quoting. While this works in the test context with programmatically generated names, it's fragile if names contain special characters. More importantly, the database parameter is accepted but unused, inconsistent with the default implementation which properly uses it via api.Relation.create().

♻️ Proposed fix

{% macro clickhouse__edr_create_schema(database, schema_name) %} - {% do run_query("CREATE DATABASE IF NOT EXISTS " ~ schema_name) %} + {% do run_query("CREATE DATABASE IF NOT EXISTS " ~ adapter.quote(schema_name)) %} {% do adapter.commit() %} {% endmacro %}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql` around lines 70 - 73, The macro clickhouse__edr_create_schema currently concatenates schema_name into SQL and ignores the database param; instead construct a dbt Relation via api.Relation.create(database=database, schema=schema_name) (or at minimum use adapter.quote_identifier on both database and schema_name) and use that quoted identifier in the run_query call, then call adapter.commit(); update clickhouse__edr_create_schema to build the relation with api.Relation.create(database=..., schema=...) and use the adapter’s quoting helpers (adapter.quote_identifier or relation rendering) for the CREATE DATABASE IF NOT EXISTS statement before committing.

…eanup-stale-ci-schemas

…ntary.schema_exists in test ClickHouse adapter doesn't expose adapter.list_schemas() or adapter.check_schema_exists() on its RuntimeDatabaseWrapper. Add a dispatched list_ci_schemas macro (SHOW DATABASES for ClickHouse, adapter.list_schemas for others) and use elementary.schema_exists() in the test helper. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

ClickHouse adapter doesn't support adapter.check_schema_exists() on RuntimeDatabaseWrapper. Add a ClickHouse-specific dispatch that uses SHOW DATABASES LIKE query instead. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

macros/utils/ci_cleanup/drop_stale_ci_schemas.sql (1)

45-47: ⚠️ Potential issue | 🟠 Major

Guard timestamp parsing against invalid calendar values.

Line 46 can still raise on values like invalid day/month and abort the whole cleanup operation. This was already flagged in an earlier review and is still unresolved.

♻️ Proposed fix

           {% if ns_valid.ok %}
-            {% set schema_ts = modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') %}
-            {% set age_seconds = (now - schema_ts).total_seconds() %}
-            {% if age_seconds > max_age_seconds %}
+            {% set yy = ts_str[0:2] | int %}
+            {% set mm = ts_str[2:4] | int %}
+            {% set dd = ts_str[4:6] | int %}
+            {% set hh = ts_str[7:9] | int %}
+            {% set mi = ts_str[9:11] | int %}
+            {% set ss = ts_str[11:13] | int %}
+            {% if 1 <= mm <= 12 and 1 <= dd <= 31 and 0 <= hh <= 23 and 0 <= mi <= 59 and 0 <= ss <= 59 %}
+              {% set schema_ts = modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') %}
+              {% set age_seconds = (now - schema_ts).total_seconds() %}
+              {% if age_seconds > max_age_seconds %}
+                {{ log("  DROP " ~ schema_name ~ "  (age: " ~ (age_seconds / 3600) | round(1) ~ " h)", info=true) }}
+                {% do elementary.drop_ci_schema(database, schema_name) %}
+                {% set ns.dropped = ns.dropped + 1 %}
+              {% else %}
+                {{ log("  keep " ~ schema_name ~ "  (age: " ~ (age_seconds / 3600) | round(1) ~ " h)", info=true) }}
+              {% endif %}
             {% else %}
-              {{ log("  keep " ~ schema_name ~ "  (age: " ~ (age_seconds / 3600) | round(1) ~ " h)", info=true) }}
+              {{ log("  skip " ~ schema_name ~ "  (invalid timestamp: " ~ ts_str ~ ")", info=true) }}
             {% endif %}
           {% endif %}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@macros/utils/ci_cleanup/drop_stale_ci_schemas.sql` around lines 45 - 47,
Guard the timestamp parse by wrapping the
modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') call (the schema_ts
computation) in a try/except so invalid calendar values (bad day/month, etc.) do
not raise and abort the cleanup; on exception, log or mark the schema as invalid
and continue (i.e., skip computing age_seconds and avoid using schema_ts) so the
rest of the loop proceeds safely when ns_valid.ok is true.

integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql (1)

70-73: ⚠️ Potential issue | 🟠 Major

Quote schema_name in ClickHouse CREATE DATABASE statement.

This is the same unresolved issue raised earlier: direct concatenation is fragile and the database arg remains unused in this implementation path.

♻️ Proposed fix

 {% macro clickhouse__edr_create_schema(database, schema_name) %}
-  {% do run_query("CREATE DATABASE IF NOT EXISTS " ~ schema_name) %}
+  {% do run_query("CREATE DATABASE IF NOT EXISTS " ~ adapter.quote(schema_name)) %}
   {% do adapter.commit() %}
 {% endmacro %}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql` around
lines 70 - 73, The macro clickhouse__edr_create_schema currently concatenates
schema_name directly and ignores the database arg; change it to build a fully
qualified name using the database argument when present and pass that through
the SQL identifier-quoting helper instead of raw concatenation (e.g., compute
full_name = database ? database ~ "." ~ schema_name : schema_name, then call
run_query("CREATE DATABASE IF NOT EXISTS " ~ adapter.quote(full_name)) ), and
remove the direct concatenation so schema_name is properly quoted and database
is used.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql`:
- Line 11: Replace the deprecated call modules.datetime.datetime.utcnow() used
to set the now variable with a timezone-aware call: use
modules.datetime.datetime.now(modules.datetime.timezone.utc) so the now variable
is UTC-aware; update the expression where now is set (the line containing
modules.datetime.datetime.utcnow()) to call .now(...) with
modules.datetime.timezone.utc to avoid the deprecation and runtime warnings.

In `@macros/utils/ci_cleanup/drop_stale_ci_schemas.sql`:
- Around line 99-101: The macro clickhouse__drop_ci_schema currently injects
schema_name directly into the SQL passed to run_query; change it to quote/escape
the identifier before building the DROP DATABASE statement (e.g., wrap the
schema name in backticks or use the adapter's identifier-quoting helper) and
ensure any existing quote characters in schema_name are escaped to avoid SQL
injection or syntax errors; update the run_query call that builds "DROP DATABASE
IF EXISTS ..." to use the quoted/escaped schema identifier instead of raw
schema_name.
- Around line 16-25: The macro drop_stale_ci_schemas can receive max_age_hours
as a string which breaks the seconds calculation; coerce max_age_hours to an
integer before computing max_age_seconds so the math is valid (e.g., use Jinja's
int filter or an explicit int cast) and update the line that sets
max_age_seconds to multiply the integer value by 3600; keep the change inside
the drop_stale_ci_schemas macro and reference the max_age_hours and
max_age_seconds symbols.

---

Duplicate comments:
In `@integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql`:
- Around line 70-73: The macro clickhouse__edr_create_schema currently
concatenates schema_name directly and ignores the database arg; change it to
build a fully qualified name using the database argument when present and pass
that through the SQL identifier-quoting helper instead of raw concatenation
(e.g., compute full_name = database ? database ~ "." ~ schema_name :
schema_name, then call run_query("CREATE DATABASE IF NOT EXISTS " ~
adapter.quote(full_name)) ), and remove the direct concatenation so schema_name
is properly quoted and database is used.

In `@macros/utils/ci_cleanup/drop_stale_ci_schemas.sql`:
- Around line 45-47: Guard the timestamp parse by wrapping the
modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') call (the schema_ts
computation) in a try/except so invalid calendar values (bad day/month, etc.) do
not raise and abort the cleanup; on exception, log or mark the schema as invalid
and continue (i.e., skip computing age_seconds and avoid using schema_ts) so the
rest of the loop proceeds safely when ns_valid.ok is true.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

📥 Commits

Reviewing files that changed from the base of the PR and between a124fbe and f327572.

📒 Files selected for processing (2)

integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql
macros/utils/ci_cleanup/drop_stale_ci_schemas.sql

coderabbitai · 2026-02-28T12:02:01Z

integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql

+
+{% macro test_drop_stale_ci_schemas() %}
+  {% set database = elementary.target_database() %}
+  {% set now = modules.datetime.datetime.utcnow() %}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "test_drop_stale_ci_schemas.sql" -type f

Repository: elementary-data/dbt-data-reliability

Length of output: 149

🌐 Web query:

Python 3.12 datetime.utcnow() deprecation removal

💡 Result:

datetime.datetime.utcnow() (and utcfromtimestamp()) were deprecated in Python 3.12 and now emit a DeprecationWarning saying they’re “scheduled for removal in a future version.” The official replacement is timezone-aware UTC datetimes, e.g. datetime.datetime.now(tz=datetime.UTC). [1]

As of the Python 3.15.0a0 docs, utcnow() is still present (still marked “Deprecated since version 3.12”), which indicates it has not been removed yet, and no specific removal version is stated in the docs. [2]

The deprecation notice is still listed in Python 3.14 deprecations as well. [3]

Recommended migration:

from datetime import datetime, UTC dt = datetime.now(UTC) # instead of datetime.utcnow()

Sources: [1] [2] [3]

🏁 Script executed:

cat -n ./integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql | head -30

Repository: elementary-data/dbt-data-reliability

Length of output: 1691

🏁 Script executed:

# Check if there are other datetime usages in the codebase to understand patterns rg "modules\.datetime" integration_tests/dbt_project/macros/ -A 2 -B 2 # Also check dbt version or requirements fd requirements -type f -name "*.txt" -o -name "*.yml" -o -name "setup.py" | head -5

Repository: elementary-data/dbt-data-reliability

Length of output: 818

🌐 Web query:

dbt Jinja modules.datetime.timezone Python version compatibility

💡 Result:

modules.datetime.timezone is part of Python’s standard library datetime module, and it’s available in all modern Python 3 versions that dbt supports—so it’s not a dbt/Python-version incompatibility issue in practice.

In dbt’s Jinja context, the modules object exposes Python stdlib modules, and modules.datetime is commonly used for date/time logic in macros (e.g., modules.datetime.datetime.now(), modules.datetime.timedelta(...)). That same object should also expose modules.datetime.timezone (e.g., modules.datetime.timezone.utc). [3]

Current dbt-core releases (e.g., dbt-core 1.11.6, released Feb 17, 2026) require Python >= 3.10 and declare support for Python 3.10–3.13. So datetime.timezone will be present. [1]

dbt-core 1.11 explicitly dropped Python 3.9, reinforcing that you’re in “modern Python” territory where datetime.timezone exists. [2]

If you’re seeing an error like “timezone is undefined” or “has no attribute timezone”, it’s more likely due to how the Jinja context is being executed (e.g., a non-dbt Jinja renderer, older tooling that doesn’t provide modules, or a linter/templater mismatch), not Python’s datetime module.

Sources: [1] [2] [3]

Use timezone-aware UTC instead of utcnow() in the test macro.

The datetime.utcnow() method is deprecated in Python 3.12+ and will be removed in a future version. Replace it with datetime.now(timezone.utc) to maintain forward compatibility and avoid runtime warnings.

♻️ Proposed fix

- {% set now = modules.datetime.datetime.utcnow() %} + {% set now = modules.datetime.datetime.now(modules.datetime.timezone.utc) %}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{% set now = modules.datetime.datetime.utcnow() %}

{% set now = modules.datetime.datetime.now(modules.datetime.timezone.utc) %}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql` at line 11, Replace the deprecated call modules.datetime.datetime.utcnow() used to set the now variable with a timezone-aware call: use modules.datetime.datetime.now(modules.datetime.timezone.utc) so the now variable is UTC-aware; update the expression where now is set (the line containing modules.datetime.datetime.utcnow()) to call .now(...) with modules.datetime.timezone.utc to avoid the deprecation and runtime warnings.

macros/utils/ci_cleanup/drop_stale_ci_schemas.sql

coderabbitai · 2026-02-28T12:02:01Z

macros/utils/ci_cleanup/drop_stale_ci_schemas.sql

+{% macro clickhouse__drop_ci_schema(database, schema_name) %}
+  {% do run_query("DROP DATABASE IF EXISTS " ~ schema_name) %}
+  {% do adapter.commit() %}


⚠️ Potential issue | 🟠 Major

Quote the ClickHouse database identifier before executing DROP DATABASE.

Line 100 concatenates schema_name directly into SQL, which is brittle for non-simple identifiers.

♻️ Proposed fix

{% macro clickhouse__drop_ci_schema(database, schema_name) %} - {% do run_query("DROP DATABASE IF EXISTS " ~ schema_name) %} + {% do run_query("DROP DATABASE IF EXISTS " ~ adapter.quote(schema_name)) %} {% do adapter.commit() %} {% endmacro %}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{% macro clickhouse__drop_ci_schema(database, schema_name) %}

{% do run_query("DROP DATABASE IF EXISTS " ~ schema_name) %}

{% do adapter.commit() %}

{% macro clickhouse__drop_ci_schema(database, schema_name) %}

{% do run_query("DROP DATABASE IF EXISTS " ~ adapter.quote(schema_name)) %}

{% do adapter.commit() %}

{% endmacro %}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@macros/utils/ci_cleanup/drop_stale_ci_schemas.sql` around lines 99 - 101, The macro clickhouse__drop_ci_schema currently injects schema_name directly into the SQL passed to run_query; change it to quote/escape the identifier before building the DROP DATABASE statement (e.g., wrap the schema name in backticks or use the adapter's identifier-quoting helper) and ensure any existing quote characters in schema_name are escaped to avoid SQL injection or syntax errors; update the run_query call that builds "DROP DATABASE IF EXISTS ..." to use the quoted/escaped schema identifier instead of raw schema_name.

haritamar · 2026-02-28T12:03:15Z

integration_tests/tests/test_ci_cleanup.py

+from dbt_project import DbtProject
+
+
+# Dremio does not support DROP SCHEMA reliably; skip it.


I see that Dremio has "DROP FOLDER", should we perhaps use that?

Good point! I looked into it — Dremio's DROP FOLDER is specifically for Arctic/Nessie catalog folders, not general schemas/spaces.

Looking at the existing codebase, clear_env.sql doesn't have a Dremio-specific dispatch — it falls through to the default dbt.drop_schema, which the dbt-dremio adapter handles internally. So the cleanup macro's default dbt.drop_schema path should already work for Dremio.

The skip was a conservative choice since I wasn't sure if Dremio could create arbitrary schemas via dbt.create_schema (which the test needs). I can try removing the skip and let CI tell us if it works — would you like me to do that?

yes let's try

Done — removed the Dremio skip in fe9ff60. CI will tell us if it works. Also fixed two other issues in the same push:\n- Double-encoding bug (test macro was returning tojson(results) but log_macro_results already calls tojson)\n- ClickHouse schema_exists now uses exact-match (system.databases WHERE name = ...) instead of SHOW DATABASES LIKE which treats _ as wildcard

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/utils/cross_db_utils/schema_exists.sql`:
- Around line 33-34: Replace the LIKE-based SHOW DATABASES check with an
exact-match query against ClickHouse's system.databases: instead of calling
run_query("SHOW DATABASES LIKE '" ~ schema ~ "'") use run_query with a SELECT
from system.databases where name = schema (e.g., "SELECT name FROM
system.databases WHERE name = '" ~ schema ~ "'") so the return(result | length >
0) remains valid; update the occurrence that constructs the SQL string around
the schema variable to ensure proper quoting/escaping and keep the rest of the
logic (the run_query call and the return(result | length > 0)) unchanged.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

📥 Commits

Reviewing files that changed from the base of the PR and between f327572 and c373f8a.

📒 Files selected for processing (1)

macros/utils/cross_db_utils/schema_exists.sql

macros/utils/cross_db_utils/schema_exists.sql

…tch for ClickHouse schema_exists Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

- Replace deprecated utcnow() with now(timezone.utc) in both macros - Coerce max_age_hours to int before computation - Guard strptime with date component range validation - Quote ClickHouse identifiers with backticks in CREATE/DROP DATABASE - Add comment explaining why workflow only uses dbt_ prefix Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…dbt Jinja context Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

adapter.list_schemas() and adapter.check_schema_exists() are not available on RuntimeDatabaseWrapper in run-operation context. Replace with: - information_schema.schemata for default list/exists - system.databases for ClickHouse - New ci_schema_exists dispatched macro for test verification Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…mentations BigQuery requires project-qualified INFORMATION_SCHEMA.SCHEMATA queries (`project`.INFORMATION_SCHEMA.SCHEMATA) unlike standard databases. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…h standard schema ops Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…CI schemas Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

haritamar · 2026-02-28T17:05:28Z

macros/utils/ci_cleanup/drop_stale_ci_schemas.sql

+
+{% macro drop_stale_ci_schemas(prefixes=None, max_age_hours=24) %}
+  {% if prefixes is none %}
+    {% set prefixes = ['dbt_', 'py_'] %}


let's make prefixes a mandatory input, and fail if it is not specified

Done — prefixes now defaults to none and raises a CompilationError with a clear message if not provided or empty.

haritamar · 2026-02-28T17:07:38Z

macros/utils/ci_cleanup/drop_stale_ci_schemas.sql

+      {% if schema_lower.startswith(prefix.lower()) %}
+        {% set remainder = schema_lower[prefix | length :] %}
+        {# Timestamp format: YYMMDD_HHMMSS (13 chars) followed by _ #}
+        {% if remainder | length >= 14 and remainder[6:7] == '_' and remainder[13:14] == '_' %}


could we use a regex with groups to match the branch name parts? it's a bit cleaner no? modules.re is available in dbt.

let's make a parse_ci_branch_name macro and use it here to simplify the main macro.

Done — replaced the manual character-by-character validation with modules.re.match('^(\\d{6})_(\\d{6})_.+', remainder) and extracted it into a parse_ci_schema_name macro. The main loop is now much cleaner — just calls parse_ci_schema_name(schema_name, prefixes) which returns a datetime or none.

Named it parse_ci_schema_name (instead of parse_ci_branch_name) since it takes a schema name as input and returns the parsed timestamp. Let me know if you'd prefer a different name.

haritamar · 2026-02-28T17:13:15Z

macros/utils/ci_cleanup/drop_stale_ci_schemas.sql

@@ -0,0 +1,156 @@
+{#
+  drop_stale_ci_schemas – clean up timestamped CI schemas.


Could this macro be in the integration tests project?

Done — moved the entire macro to integration_tests/dbt_project/macros/drop_stale_ci_schemas.sql. The old macros/utils/ci_cleanup/ directory is deleted. Workflow now calls drop_stale_ci_schemas without the elementary. prefix.

haritamar · 2026-02-28T17:14:52Z

macros/utils/ci_cleanup/drop_stale_ci_schemas.sql

+
+{# ── Per-adapter schema listing ─────────────────────────────────────── #}
+
+{% macro list_ci_schemas(database) %}


Maybe this can be in its own file? (also in the integration tests project)
and also ci_schema_exists.

Done — list_ci_schemas and ci_schema_exists (with all adapter dispatches) are now in their own file: integration_tests/dbt_project/macros/ci_schema_helpers.sql. All dispatch calls use 'elementary_tests' as the search package.

haritamar · 2026-02-28T17:16:19Z

macros/utils/ci_cleanup/drop_stale_ci_schemas.sql

+{# adapter.list_schemas() is not available in run-operation context
+   (RuntimeDatabaseWrapper does not expose it). Use information_schema instead. #}
+{% macro default__list_ci_schemas(database) %}
+  {% set results = run_query("SELECT schema_name FROM information_schema.schemata") %}


As a safety, can we just list schemas in the CI db? (the one in the profiles)

Done — added WHERE lower(catalog_name) = lower('<database>') filter to the default list_ci_schemas implementation. BigQuery was already scoped via project-qualified INFORMATION_SCHEMA. ClickHouse lists all databases (no catalog concept).

haritamar · 2026-02-28T17:17:04Z

macros/utils/ci_cleanup/drop_stale_ci_schemas.sql

+{% endmacro %}
+
+{% macro default__ci_schema_exists(database, schema_name) %}
+  {% set result = run_query("SELECT schema_name FROM information_schema.schemata WHERE lower(schema_name) = lower('" ~ schema_name ~ "')") %}


likewise, only consider schemas in the CI database.

Done — added WHERE lower(catalog_name) = lower('<database>') AND ... filter to the default ci_schema_exists implementation as well.

devin-ai-integration · 2026-02-28T17:22:44Z

Thanks for the thorough review! Addressing all comments now:

Move macro to integration tests project — will move drop_stale_ci_schemas and helpers out of the main package into integration_tests/dbt_project/macros/
Split list_ci_schemas and ci_schema_exists into own file — creating ci_schema_helpers.sql
Make prefixes mandatory — will fail with clear error if not provided
Scope queries to CI database — adding catalog_name filter to information_schema queries
Use modules.re regex + parse_ci_schema_name macro — replacing manual character validation with regex. Named it parse_ci_schema_name (instead of parse_ci_branch_name) since it parses the full schema name and returns the timestamp — let me know if you prefer a different name.

Note: moving the macro out of the main package means the elementary repo workflow will need its own copy of these macros (or a different approach). I'll update that PR separately.

… mandatory prefixes Addresses review comments: - Move drop_stale_ci_schemas and helpers from main package to integration_tests - Split list_ci_schemas and ci_schema_exists into ci_schema_helpers.sql - Make prefixes a mandatory parameter (error if not provided) - Scope information_schema queries to CI database via catalog_name filter - Replace manual char validation with modules.re regex in parse_ci_schema_name - Update dispatch package from 'elementary' to 'elementary_tests' - Update workflow to call macros without elementary. prefix Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

haritamar · 2026-02-28T17:34:50Z

integration_tests/dbt_project/macros/drop_stale_ci_schemas.sql

+
+{# ── CI schema name parser ─────────────────────────────────────────── #}
+
+{% macro parse_ci_schema_name(schema_name, prefixes) %}


Maybe it should be parse_timestamp_from_ci_schema_name

Done — renamed to parse_timestamp_from_ci_schema_name.

haritamar · 2026-02-28T17:42:03Z

integration_tests/dbt_project/macros/drop_stale_ci_schemas.sql

+      {% if match %}
+        {% set ts_str = match.group(1) ~ '_' ~ match.group(2) %}
+        {# Validate date component ranges before calling strptime #}
+        {% set mm = ts_str[2:4] | int %}


as we're anyway parsing all the different parts, maybe:

The groups should be each individual component (e.g. set mm = match.group(...)).

For readability, can we use named groups?

Since we already parse all the parts, we can just initialize the datetime object directly, e.g. datetime(2000 + yy, mm, ....)

Done — now uses named groups ((?P<yy>\\d{2})(?P<mm>\\d{2}) etc.), extracts each component via match.group('yy'), and constructs the datetime directly with datetime(2000 + yy, mm, dd, HH, MI, SS) instead of strptime.

haritamar · 2026-02-28T17:44:20Z

integration_tests/dbt_project/macros/drop_stale_ci_schemas.sql

+{% endmacro %}
+
+
+{# ── Per-adapter schema drop ─────────────────────────────────────────── #}


Maybe drop_ci_schema should also be in the schema helpers?

Done — yes, moved drop_ci_schema to schema_utils/drop_ci_schema.sql alongside the other helpers.

haritamar · 2026-02-28T17:44:57Z

integration_tests/dbt_project/macros/ci_schema_helpers.sql

+#}
+
+{# ── Per-adapter schema listing (scoped to CI database) ────────────── #}
+


I think it may be good to create a folder called schema_utils with each one of these macros in a separate file.

Done — created schema_utils/ folder with each macro in its own file:

schema_utils/list_ci_schemas.sql

schema_utils/ci_schema_exists.sql

schema_utils/drop_ci_schema.sql

schema_utils/parse_timestamp_from_ci_schema_name.sql

…e parse macro Addresses review comments: - Create schema_utils/ folder with each helper in its own file: list_ci_schemas, ci_schema_exists, drop_ci_schema, parse_timestamp_from_ci_schema_name - Rename parse_ci_schema_name to parse_timestamp_from_ci_schema_name - Use named regex groups: (?P<yy>), (?P<mm>), etc. - Construct datetime directly via datetime(2000+yy, mm, dd, HH, MI, SS) instead of strptime - Move drop_ci_schema from main macro file to schema_utils/ - Delete old ci_schema_helpers.sql (replaced by individual files) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

integration_tests/dbt_project/macros/schema_utils/schema_exists.sql

…I-specific) Addresses review comment: - schema_utils/ contains generic helpers: list_schemas_sql, schema_exists_sql, drop_schema_sql (renamed from ci_ prefixed versions) - ci_schemas_cleanup/ contains CI-specific macros: drop_stale_ci_schemas, parse_timestamp_from_ci_schema_name, test_drop_stale_ci_schemas Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…nternals dbt-databricks defines list_schemas_sql and check_schema_exists_sql as internal helper macros. Our identically-named macros were overriding them, causing dbt run to send [] (Python list) as SQL instead of SHOW SCHEMAS. Renamed: list_schemas_sql -> edr_list_schemas schema_exists_sql -> edr_schema_exists drop_schema_sql -> edr_drop_schema Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai

🧹 Nitpick comments (4)

integration_tests/dbt_project/macros/schema_utils/schema_exists.sql (3)

13-16: Schema name should be escaped to prevent SQL injection.

Both database and schema_name are directly concatenated into SQL. The schema_name parameter originates from edr_list_schemas results, which could theoretically contain quotes in edge cases.

♻️ Suggested improvement

 {% macro default__edr_schema_exists(database, schema_name) %}
-  {% set result = run_query("SELECT schema_name FROM information_schema.schemata WHERE lower(catalog_name) = lower('" ~ database ~ "') AND lower(schema_name) = lower('" ~ schema_name ~ "')") %}
+  {% set safe_db = database | replace("'", "''") %}
+  {% set safe_schema = schema_name | replace("'", "''") %}
+  {% set result = run_query("SELECT schema_name FROM information_schema.schemata WHERE lower(catalog_name) = lower('" ~ safe_db ~ "') AND lower(schema_name) = lower('" ~ safe_schema ~ "')") %}
   {% do return(result | length > 0) %}
 {% endmacro %}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration_tests/dbt_project/macros/schema_utils/schema_exists.sql` around
lines 13 - 16, The macro default__edr_schema_exists concatenates user-controlled
database and schema_name into SQL, risking injection; update it to safely
quote/escape both identifiers before embedding in run_query (use the dbt adapter
quoting helpers such as adapter.quote or adapter.quote_identifier or otherwise
escape single quotes via replace) so the SQL string passed to run_query only
contains properly quoted identifiers; locate the macro
default__edr_schema_exists and replace direct concatenation of database and
schema_name with adapter-based quoting (or explicit escaping) when building the
SELECT against information_schema.schemata.

23-26: Same escaping improvement applies to ClickHouse variant.

♻️ Suggested improvement

 {% macro clickhouse__edr_schema_exists(database, schema_name) %}
-  {% set result = run_query("SELECT 1 FROM system.databases WHERE name = '" ~ schema_name ~ "' LIMIT 1") %}
+  {% set safe_schema = schema_name | replace("'", "''") %}
+  {% set result = run_query("SELECT 1 FROM system.databases WHERE name = '" ~ safe_schema ~ "' LIMIT 1") %}
   {% do return(result | length > 0) %}
 {% endmacro %}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration_tests/dbt_project/macros/schema_utils/schema_exists.sql` around
lines 23 - 26, The ClickHouse macro clickhouse__edr_schema_exists builds a query
by concatenating schema_name into a single-quoted string (run_query("SELECT 1
FROM system.databases WHERE name = '" ~ schema_name ~ "' LIMIT 1")), which can
break for names with quotes or special chars; update the macro to safely
quote/escape the schema identifier before injecting it into run_query (e.g., use
the adapter quoting helper such as adapter.quote_identifier(schema_name) or an
equivalent safe-escaping function) and keep the rest of the logic (run_query and
the length check) unchanged.

18-21: Same escaping improvement applies to BigQuery variant.

♻️ Suggested improvement

 {% macro bigquery__edr_schema_exists(database, schema_name) %}
-  {% set result = run_query("SELECT schema_name FROM `" ~ database ~ "`.INFORMATION_SCHEMA.SCHEMATA WHERE lower(schema_name) = lower('" ~ schema_name ~ "')") %}
+  {% set safe_schema = schema_name | replace("'", "''") %}
+  {% set result = run_query("SELECT schema_name FROM `" ~ database ~ "`.INFORMATION_SCHEMA.SCHEMATA WHERE lower(schema_name) = lower('" ~ safe_schema ~ "')") %}
   {% do return(result | length > 0) %}
 {% endmacro %}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration_tests/dbt_project/macros/schema_utils/schema_exists.sql` around
lines 18 - 21, The BigQuery macro bigquery__edr_schema_exists builds a SQL
string with unescaped database and schema_name which can break queries or allow
injection; fix it by escaping inputs before constructing the query: in
bigquery__edr_schema_exists apply safe escaping to schema_name (e.g.,
schema_name | replace("'", "''")) and to database (e.g., database | replace("`",
"``") or otherwise ensure it's a trusted identifier) when interpolating into the
run_query call so the generated SELECT uses a safely quoted string literal for
schema_name and a safely quoted identifier for database.

integration_tests/dbt_project/macros/schema_utils/list_schemas.sql (1)

13-20: Consider using parameterized queries or proper quoting for the database name.

The database parameter is directly concatenated into SQL (line 14). While in this context the value comes from elementary.target_database() (controlled configuration), this pattern is fragile if a database name ever contains a single quote.

For defense-in-depth, consider using dbt's quoting utilities:

♻️ Suggested improvement

 {% macro default__edr_list_schemas(database) %}
-  {% set results = run_query("SELECT schema_name FROM information_schema.schemata WHERE lower(catalog_name) = lower('" ~ database ~ "')") %}
+  {% set safe_db = database | replace("'", "''") %}
+  {% set results = run_query("SELECT schema_name FROM information_schema.schemata WHERE lower(catalog_name) = lower('" ~ safe_db ~ "')") %}
   {% set schemas = [] %}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration_tests/dbt_project/macros/schema_utils/list_schemas.sql` around
lines 13 - 20, The macro default__edr_list_schemas currently concatenates the
database string into SQL via run_query, which can break if the database name
contains quotes; update the run_query call to use dbt's quoting helper (e.g.,
replace the raw concatenation with adapter.quote(database) or
adapter.quote_identifier(database)) so the database identifier is properly
quoted/escaped before being inserted into the SQL string; keep the rest of the
macro (results loop, schemas list, return) intact and reference
default__edr_list_schemas and run_query when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@integration_tests/dbt_project/macros/schema_utils/list_schemas.sql`:
- Around line 13-20: The macro default__edr_list_schemas currently concatenates
the database string into SQL via run_query, which can break if the database name
contains quotes; update the run_query call to use dbt's quoting helper (e.g.,
replace the raw concatenation with adapter.quote(database) or
adapter.quote_identifier(database)) so the database identifier is properly
quoted/escaped before being inserted into the SQL string; keep the rest of the
macro (results loop, schemas list, return) intact and reference
default__edr_list_schemas and run_query when making the change.

In `@integration_tests/dbt_project/macros/schema_utils/schema_exists.sql`:
- Around line 13-16: The macro default__edr_schema_exists concatenates
user-controlled database and schema_name into SQL, risking injection; update it
to safely quote/escape both identifiers before embedding in run_query (use the
dbt adapter quoting helpers such as adapter.quote or adapter.quote_identifier or
otherwise escape single quotes via replace) so the SQL string passed to
run_query only contains properly quoted identifiers; locate the macro
default__edr_schema_exists and replace direct concatenation of database and
schema_name with adapter-based quoting (or explicit escaping) when building the
SELECT against information_schema.schemata.
- Around line 23-26: The ClickHouse macro clickhouse__edr_schema_exists builds a
query by concatenating schema_name into a single-quoted string
(run_query("SELECT 1 FROM system.databases WHERE name = '" ~ schema_name ~ "'
LIMIT 1")), which can break for names with quotes or special chars; update the
macro to safely quote/escape the schema identifier before injecting it into
run_query (e.g., use the adapter quoting helper such as
adapter.quote_identifier(schema_name) or an equivalent safe-escaping function)
and keep the rest of the logic (run_query and the length check) unchanged.
- Around line 18-21: The BigQuery macro bigquery__edr_schema_exists builds a SQL
string with unescaped database and schema_name which can break queries or allow
injection; fix it by escaping inputs before constructing the query: in
bigquery__edr_schema_exists apply safe escaping to schema_name (e.g.,
schema_name | replace("'", "''")) and to database (e.g., database | replace("`",
"``") or otherwise ensure it's a trusted identifier) when interpolating into the
run_query call so the generated SELECT uses a safely quoted string literal for
schema_name and a safely quoted identifier for database.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

📥 Commits

Reviewing files that changed from the base of the PR and between c373f8a and 6913c74.

📒 Files selected for processing (9)

.github/workflows/cleanup-stale-schemas.yml
integration_tests/dbt_project/macros/ci_schemas_cleanup/drop_stale_ci_schemas.sql
integration_tests/dbt_project/macros/ci_schemas_cleanup/parse_timestamp_from_ci_schema_name.sql
integration_tests/dbt_project/macros/ci_schemas_cleanup/test_drop_stale_ci_schemas.sql
integration_tests/dbt_project/macros/schema_utils/drop_schema.sql
integration_tests/dbt_project/macros/schema_utils/list_schemas.sql
integration_tests/dbt_project/macros/schema_utils/schema_exists.sql
integration_tests/tests/test_ci_cleanup.py
macros/utils/cross_db_utils/schema_exists.sql

🚧 Files skipped from review as they are similar to previous changes (3)

macros/utils/cross_db_utils/schema_exists.sql
.github/workflows/cleanup-stale-schemas.yml
integration_tests/tests/test_ci_cleanup.py

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/dbt_project/macros/schema_utils/schema_exists.sql`:
- Around line 20-23: The SQL currently double-backticks the identifier instead
of using BigQuery's required backslash escape; in the macro
bigquery__edr_schema_exists update the escaping call used in the run_query
construction so backticks in identifiers are replaced with "\`"
(backslash-escaped) rather than "``" and apply the same escaping approach to the
database and schema_name usage within the run_query string (refer to the
replace(...) expression inside the run_query and mirror the fix used in
list_schemas.sql for consistency).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

📥 Commits

Reviewing files that changed from the base of the PR and between 6913c74 and 45c65ef.

📒 Files selected for processing (3)

integration_tests/dbt_project/macros/ci_schemas_cleanup/drop_stale_ci_schemas.sql
integration_tests/dbt_project/macros/schema_utils/list_schemas.sql
integration_tests/dbt_project/macros/schema_utils/schema_exists.sql

🚧 Files skipped from review as they are similar to previous changes (2)

integration_tests/dbt_project/macros/schema_utils/list_schemas.sql
integration_tests/dbt_project/macros/ci_schemas_cleanup/drop_stale_ci_schemas.sql

integration_tests/dbt_project/macros/schema_utils/schema_exists.sql

…bit suggestion) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration bot assigned haritamar Feb 28, 2026

fix: remove unused ns.errors variable

4605b7f

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

haritamar reviewed Feb 28, 2026

View reviewed changes

macros/utils/ci_cleanup/drop_stale_ci_schemas.sql Outdated Show resolved Hide resolved

test: add integration test for drop_stale_ci_schemas macro

fbe4059

Creates test schemas with old/recent timestamps, runs the cleanup macro, and verifies old schemas are dropped while recent ones are kept. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

fix: guard recent schema cleanup with existence check

a124fbe

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

devin-ai-integration bot and others added 3 commits February 28, 2026 11:50

Merge remote-tracking branch 'origin/master' into devin/1772278180-cl…

fb87fe6

…eanup-stale-ci-schemas

fix: add clickhouse__schema_exists implementation

c373f8a

ClickHouse adapter doesn't support adapter.check_schema_exists() on RuntimeDatabaseWrapper. Add a ClickHouse-specific dispatch that uses SHOW DATABASES LIKE query instead. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

haritamar reviewed Feb 28, 2026

View reviewed changes

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

macros/utils/cross_db_utils/schema_exists.sql Outdated Show resolved Hide resolved

devin-ai-integration bot and others added 9 commits February 28, 2026 12:06

fix: return dict instead of tojson (double-encoding) and use exact-ma…

a9f398d

…tch for ClickHouse schema_exists Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

test: remove Dremio skip to let CI validate schema cleanup

fe9ff60

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: revert to utcnow() - modules.datetime.timezone not available in …

572eeca

…dbt Jinja context Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: add BigQuery-specific list_ci_schemas and ci_schema_exists imple…

1c6b4db

…mentations BigQuery requires project-qualified INFORMATION_SCHEMA.SCHEMATA queries (`project`.INFORMATION_SCHEMA.SCHEMATA) unlike standard databases. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

test: re-add Dremio skip - DremioRelation path model incompatible wit…

f85599c

…h standard schema ops Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: increase test max_age_hours to 8760 to avoid race with parallel …

5adb93f

…CI schemas Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

docs: fix comment - year 2020, not year 2000

dbd9047

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

haritamar reviewed Feb 28, 2026

View reviewed changes

devin-ai-integration bot and others added 2 commits February 28, 2026 17:47

rename: schema_utils/ -> ci_schema_utils/ for clarity

357d9a2

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

haritamar reviewed Feb 28, 2026

View reviewed changes

integration_tests/dbt_project/macros/schema_utils/schema_exists.sql Show resolved Hide resolved

devin-ai-integration bot and others added 2 commits February 28, 2026 17:58

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

devin-ai-integration bot and others added 2 commits February 28, 2026 19:27

fix: remove duplicate edr_drop_schema - already exists in clear_env.sql

3ca9f64

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: add SQL escaping for defense-in-depth (CodeRabbit suggestion)

45c65ef

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

integration_tests/dbt_project/macros/schema_utils/schema_exists.sql Show resolved Hide resolved

fix: add BigQuery backtick escaping for database identifiers (CodeRab…

320a250

…bit suggestion) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

haritamar approved these changes Feb 28, 2026

View reviewed changes

haritamar merged commit 7d65c72 into master Feb 28, 2026
22 checks passed

haritamar deleted the devin/1772278180-cleanup-stale-ci-schemas branch February 28, 2026 20:13

devin-ai-integration bot mentioned this pull request Feb 28, 2026

feat: add daily cleanup workflow for stale CI schemas elementary-data/elementary#2128

Merged

3 tasks

		{% if ns_valid.ok %}
		{% set schema_ts = modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') %}

	{% set now = modules.datetime.datetime.utcnow() %}
	{% set now = modules.datetime.datetime.now(modules.datetime.timezone.utc) %}

		from dbt_project import DbtProject


		# Dremio does not support DROP SCHEMA reliably; skip it.

		@@ -0,0 +1,156 @@
		{#
		drop_stale_ci_schemas – clean up timestamped CI schemas.


		{# ── Per-adapter schema listing ─────────────────────────────────────── #}

		{% macro list_ci_schemas(database) %}


		{# ── CI schema name parser ─────────────────────────────────────────── #}

		{% macro parse_ci_schema_name(schema_name, prefixes) %}

		{% endmacro %}


		{# ── Per-adapter schema drop ─────────────────────────────────────────── #}

		#}

		{# ── Per-adapter schema listing (scoped to CI database) ────────────── #}

Conversation

devin-ai-integration bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat: add weekly cleanup workflow for stale CI schemas

Summary

Macro organisation

Updates since last revision

Databricks naming conflict

SQL escaping improvements

Review & Testing Checklist for Human

Notes

Race condition with parallel CI schemas

Summary by CodeRabbit

Uh oh!

devin-ai-integration bot commented Feb 28, 2026

🤖 Devin AI Engineer

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

coderabbitai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Replacement

If you need compatibility with older Python versions

Related: utcfromtimestamp() replacement

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot commented Feb 28, 2026 •

edited

Loading

coderabbitai bot commented Feb 28, 2026 •

edited

Loading

Related: `utcfromtimestamp()` replacement