add option to partition large models by rrbarbosa · Pull Request #939 · elementary-data/dbt-data-reliability

rrbarbosa · 2026-02-26T16:02:01Z

The tables dbt_invocations and dbt_run_results run without check. Our org has several dbt projects thus processing it for downstream use cases becomes very costly.

This PR adds support for partitions the data by day, reducing the processing costs dramatically.

I've only added support for BigQuery, as that's the adapter we use.

Similar to what was requested here: elementary-data/elementary#1715

Summary by CodeRabbit

New Features
- Added support for partitioning run results tables with configurable partition strategies.
- Introduced configuration options for enabling partitioned run results and specifying partition criteria.
- Added BigQuery-specific partitioning defaults using timestamp granularity.
Tests
- Added integration tests for BigQuery partitioned run results functionality.

github-actions · 2026-02-26T16:02:12Z

👋 @rrbarbosa
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

coderabbitai · 2026-02-26T16:02:22Z

📝 Walkthrough

Walkthrough

This change introduces partitioning support for dbt artifact tables by adding configuration options to enable partitioning of run results and invocations tables by creation timestamp in BigQuery, with integration tests to verify the feature works correctly.

Changes

Cohort / File(s)	Summary
Integration Tests `integration_tests/tests/test_dbt_artifacts/test_artifacts.py`	Added two new tests for BigQuery targets: `test_run_results_partitioned` verifies partitioned run results data accessibility; `test_dbt_invocations_partitioned` validates dbt_invocations table readability under partitioned conditions.
Configuration `macros/edr/system/system_utils/get_config_var.sql`	Added new config keys `partition_run_results` (default: false) and `run_results_partition_by` (default: none). Updated BigQuery defaults to include partition spec with field `created_at`, data type `timestamp`, and granularity `day`.
dbt Models `models/edr/dbt_artifacts/dbt_run_results.sql`, `models/edr/dbt_artifacts/dbt_invocations.sql`	Added conditional `partition_by` configuration to both models. When `partition_run_results` is enabled, applies `run_results_partition_by` partition spec; otherwise defaults to none.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 ✨ Partitions fair, by days arranged,
The run results table's been rearranged!
With config flags and conditional care,
BigQuery rows now organize with flair.
Tests hop along to verify all's right,
Our artifacts shine in their new delight! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'add option to partition large models' directly and clearly describes the main objective of the changeset: adding partitioning support to the dbt_invocations and dbt_run_results models to reduce processing costs.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

integration_tests/tests/test_dbt_artifacts/test_artifacts.py (2)

177-179: Consider using read_table for consistency with other tests.

While TEST_MODEL is a hardcoded constant (so the static analysis SQL injection warning is a false positive), using read_table would be more consistent with the pattern used in test_dbt_invocations_partitioned and other tests in this file.

♻️ Suggested refactor for consistency

-    results = dbt_project.run_query(
-        """select * from {{ ref("dbt_run_results") }} where name='%s'""" % TEST_MODEL
-    )
-    assert len(results) >= 1
+    dbt_project.read_table(
+        "dbt_run_results", where=f"name = '{TEST_MODEL}'", raise_if_empty=True
+    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_dbt_artifacts/test_artifacts.py` around lines
177 - 179, Replace the raw SQL call to dbt_project.run_query that embeds
TEST_MODEL with the consistent helper method read_table used elsewhere (e.g., in
test_dbt_invocations_partitioned): call dbt_project.read_table or the test
file's read_table helper to query dbt_run_results filtered by TEST_MODEL instead
of using string interpolation; update the line using dbt_project.run_query(...)
to use read_table with the same filter so the test follows the established
pattern and avoids the apparent SQL-injection style interpolation.

170-191: Consider verifying partitioning was actually applied.

The tests verify that data is readable after enabling partitioning, which is a good smoke test. For more confidence, you could add an assertion that the table is actually partitioned by querying BigQuery's INFORMATION_SCHEMA.PARTITIONS or TABLE_OPTIONS.

Example verification query

# After the run, verify the table is partitioned:
partition_info = dbt_project.run_query(
    """
    SELECT option_value 
    FROM `{{ ref("dbt_run_results").database }}`.`{{ ref("dbt_run_results").schema }}`.INFORMATION_SCHEMA.TABLE_OPTIONS
    WHERE table_name = 'dbt_run_results' AND option_name = 'partition_expiration_days'
    """
)
# Or check PARTITIONS table for partition existence

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_dbt_artifacts/test_artifacts.py` around lines
170 - 191, Add an explicit assertion that the BigQuery tables are actually
partitioned after enabling partition_run_results in the tests
test_run_results_partitioned and test_dbt_invocations_partitioned: after calling
dbt_project.dbt_runner.run (and after dbt_project.read_table where appropriate),
run a query via dbt_project.run_query against BigQuery's
INFORMATION_SCHEMA.TABLE_OPTIONS or the PARTITIONS view for the dbt_run_results
table (use the referenced table name via {{ ref("dbt_run_results") }} or
TEST_MODEL) and assert the expected partition option or presence of partitions
(e.g., option_name like 'partition_expiration_days' or non-empty PARTITION rows)
to ensure partitioning was applied.

macros/edr/system/system_utils/get_config_var.sql (1)

85-98: Partitioning is silently ignored on non-BigQuery adapters.

When partition_run_results is enabled on adapters other than BigQuery, run_results_partition_by remains none, so partitioning won't actually occur. This could confuse users who expect the feature to work.

Consider either:

Adding default partition specs for other adapters that support partitioning (Snowflake, Databricks)

Logging a warning when partition_run_results=true but run_results_partition_by is none

Documenting that this feature currently only works on BigQuery
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/system/system_utils/get_config_var.sql` around lines 85 - 98, The
current macro bigquery__get_default_config only sets run_results_partition_by
for BigQuery, which means partition_run_results can be true but ignored for
other adapters; update the default handling so that when partition_run_results
is true and run_results_partition_by is none you either (a) set sensible
defaults for other partitioning-capable adapters (e.g., Snowflake/Databricks) by
adding adapter-specific branches that populate run_results_partition_by, or (b)
emit a clear warning/log when partition_run_results=true but
run_results_partition_by remains none to inform users; locate the logic around
default__get_default_config, bigquery__get_default_config and the keys
'partition_run_results'/'run_results_partition_by' to apply the chosen fix.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@integration_tests/tests/test_dbt_artifacts/test_artifacts.py`:
- Around line 177-179: Replace the raw SQL call to dbt_project.run_query that
embeds TEST_MODEL with the consistent helper method read_table used elsewhere
(e.g., in test_dbt_invocations_partitioned): call dbt_project.read_table or the
test file's read_table helper to query dbt_run_results filtered by TEST_MODEL
instead of using string interpolation; update the line using
dbt_project.run_query(...) to use read_table with the same filter so the test
follows the established pattern and avoids the apparent SQL-injection style
interpolation.
- Around line 170-191: Add an explicit assertion that the BigQuery tables are
actually partitioned after enabling partition_run_results in the tests
test_run_results_partitioned and test_dbt_invocations_partitioned: after calling
dbt_project.dbt_runner.run (and after dbt_project.read_table where appropriate),
run a query via dbt_project.run_query against BigQuery's
INFORMATION_SCHEMA.TABLE_OPTIONS or the PARTITIONS view for the dbt_run_results
table (use the referenced table name via {{ ref("dbt_run_results") }} or
TEST_MODEL) and assert the expected partition option or presence of partitions
(e.g., option_name like 'partition_expiration_days' or non-empty PARTITION rows)
to ensure partitioning was applied.

In `@macros/edr/system/system_utils/get_config_var.sql`:
- Around line 85-98: The current macro bigquery__get_default_config only sets
run_results_partition_by for BigQuery, which means partition_run_results can be
true but ignored for other adapters; update the default handling so that when
partition_run_results is true and run_results_partition_by is none you either
(a) set sensible defaults for other partitioning-capable adapters (e.g.,
Snowflake/Databricks) by adding adapter-specific branches that populate
run_results_partition_by, or (b) emit a clear warning/log when
partition_run_results=true but run_results_partition_by remains none to inform
users; locate the logic around default__get_default_config,
bigquery__get_default_config and the keys
'partition_run_results'/'run_results_partition_by' to apply the chosen fix.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f0307f3 and 8ca28e1.

📒 Files selected for processing (4)

integration_tests/tests/test_dbt_artifacts/test_artifacts.py
macros/edr/system/system_utils/get_config_var.sql
models/edr/dbt_artifacts/dbt_invocations.sql
models/edr/dbt_artifacts/dbt_run_results.sql

rrbarbosa · 2026-02-26T16:20:58Z

about the bot comments:

Docstring? Doesn't seem applicable.
The tests seem consistent with what's on the repo.
Adding a query as part of the test seems like a bad idea to me, I've checked this manually when while using the provide test project. No way for me to test other adapters
On the setting being silent ignore in other adapters, fair. But could not find the docs for these settings anywhere. And there's no way for me to implement/test other adapters.

haritamar · 2026-02-28T17:25:55Z

Hi @rrbarbosa - thanks for your contribution!
I'm wondering - should we just always set partition fields for BigQuery? Is there a reason users won't want this / to customize it?
What are you setting for these fields? I'm assuming by created_at or some other timestamp?

Also, not a must, but is it possible to verify in the test that the table is really partitioned?

add option to partition large models

8ca28e1

rrbarbosa requested a deployment to elementary_test_env February 26, 2026 16:02 — with GitHub Actions Waiting

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add option to partition large models#939

add option to partition large models#939
rrbarbosa wants to merge 1 commit intoelementary-data:masterfrom
rrbarbosa:feat/partition_run_results

rrbarbosa commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

rrbarbosa commented Feb 26, 2026

Uh oh!

haritamar commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rrbarbosa commented Feb 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

rrbarbosa commented Feb 26, 2026

Uh oh!

haritamar commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rrbarbosa commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading