-
Notifications
You must be signed in to change notification settings - Fork 105
Ele 4724 dbt fusion support #825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes introduce support for dbt Fusion by adding new macros, updating test infrastructure, and modifying macro logic to handle differences between dbt versions. Configuration keys and logic related to materialization overrides are removed or refactored. Several YAML and ignore files are updated, and a package lock file is introduced for dbt_utils versioning. Changes
Sequence Diagram(s)sequenceDiagram
participant Pytest as Pytest CLI
participant Conftest as conftest.py
participant Env as Environment
participant DbtProj as DbtProject
participant Runner as DbtRunner
Pytest->>Conftest: parse --runner-method option
Conftest->>Conftest: Validate runner method & target
Conftest->>Env: init_tests_env(..., runner_method)
Env->>DbtProj: DbtProject(..., runner_method)
DbtProj->>Runner: get_dbt_runner(..., runner_method)
Note right of Runner: Runner adapts logic based on runner_method
sequenceDiagram
participant Macro as Macro
participant Fusion as is_dbt_fusion
participant Util as agate_to_dict
Macro->>Fusion: Check if dbt Fusion
alt Fusion is true
Macro->>Util: agate_to_dict returns agate_table as is
else Fusion is false
Macro->>Util: agate_to_dict returns agate_table.to_dict()
end
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
👋 @haritamar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
.github/workflows/test-warehouse.yml
(2 hunks).gitignore
(1 hunks)integration_tests/dbt_project/.gitignore
(1 hunks)integration_tests/dbt_project/dbt_project.yml
(0 hunks)integration_tests/dbt_project/macros/materializations.sql
(1 hunks)integration_tests/dbt_project/models/exposures.yml
(2 hunks)integration_tests/dbt_project/packages.yml
(1 hunks)integration_tests/deprecated_tests/dbt_project.yml
(0 hunks)integration_tests/deprecated_tests/macros/system/materializations.sql
(1 hunks)integration_tests/tests/conftest.py
(5 hunks)integration_tests/tests/dbt_project.py
(2 hunks)integration_tests/tests/env.py
(1 hunks)macros/edr/dbt_artifacts/upload_dbt_invocation.sql
(1 hunks)macros/edr/system/hooks/on_run_start.sql
(0 hunks)macros/edr/system/system_utils/get_config_var.sql
(0 hunks)macros/edr/system/system_utils/is_dbt_fusion.sql
(1 hunks)macros/edr/tests/on_run_start/ensure_materialize_override.sql
(0 hunks)macros/utils/graph/get_run_result_dict.sql
(2 hunks)macros/utils/run_queries/agate_to_dict.sql
(1 hunks)macros/utils/run_queries/agate_to_dicts.sql
(1 hunks)models/run_results.yml
(1 hunks)package-lock.yml
(1 hunks)
💤 Files with no reviewable changes (5)
- integration_tests/dbt_project/dbt_project.yml
- macros/edr/system/hooks/on_run_start.sql
- integration_tests/deprecated_tests/dbt_project.yml
- macros/edr/system/system_utils/get_config_var.sql
- macros/edr/tests/on_run_start/ensure_materialize_override.sql
🧰 Additional context used
🧬 Code Graph Analysis (2)
integration_tests/tests/env.py (2)
integration_tests/tests/conftest.py (3)
dbt_project
(132-135)target
(139-140)runner_method
(154-164)integration_tests/tests/dbt_project.py (1)
get_dbt_runner
(33-42)
integration_tests/tests/dbt_project.py (1)
integration_tests/tests/conftest.py (2)
target
(139-140)runner_method
(154-164)
🔇 Additional comments (24)
.gitignore (1)
27-27
: Addition looks goodIgnoring
dbt_internal_packages/
is consistent with dbt-managed dependencies and keeps the repo clean.integration_tests/dbt_project/.gitignore (1)
3-3
: Consistent ignore patternSame note as for the root
.gitignore
– the addition is correct and harmless..github/workflows/test-warehouse.yml (1)
24-25
: Temporary branch defaults should be removed before mergeBoth inputs now default to the feature branch
ele-4724-dbt_fusion-support
and are tagged# TEMPORARY
.
If this hitsmain
, future workflow runs will silently pin to a branch that will eventually be deleted, breaking CI.Diff to restore the previous behaviour (empty defaults) once the feature is merged:
- default: ele-4724-dbt_fusion-support # TEMPORARY + # default intentionally left blank – callers should provide a refApply the same change for the second occurrence.
Also applies to: 46-47
integration_tests/dbt_project/macros/materializations.sql (1)
3-6
: Direct macro calls simplify the materialization – nice cleanupSwitching from
.call_macro()
to direct calls is clearer and avoids the extra invocation layer. Verified that the call signatures stay the same.Also applies to: 11-17
integration_tests/deprecated_tests/macros/system/materializations.sql (1)
2-2
: LGTM: Simplified macro invocation patternThe removal of
.call_macro()
wrappers standardizes the macro calling convention and makes the code more direct and readable.Also applies to: 6-6
macros/edr/system/system_utils/is_dbt_fusion.sql (1)
1-7
: LGTM: Clean implementation for fusion detectionThe logic correctly identifies dbt fusion versions by checking if the major version is greater than 1. The implementation assumes
dbt_version
follows semantic versioning format, which is a reasonable assumption.integration_tests/tests/env.py (1)
1-2
: LGTM: Proper integration of runner_method parameterThe addition of the
runner_method
parameter with proper typing and default value maintains backward compatibility while enabling the new fusion runner functionality. The parameter is correctly passed through to the dbt runner initialization.Also applies to: 8-16
integration_tests/dbt_project/packages.yml (1)
4-4
: LGTM: Simplified version specificationThe change from dynamic Jinja logic to a fixed version range
">=0.8.0", "<2.0.0"
simplifies the dependency specification while maintaining compatibility. The actual version pinning is handled by the package-lock.yml file for deterministic builds.macros/utils/graph/get_run_result_dict.sql (3)
6-6
: LGTM: Consistent with agate_to_dict standardization.The replacement of
run_result.to_dict()
withelementary.agate_to_dict(run_result)
aligns with the broader effort to standardize agate table conversions for dbt Fusion compatibility.
12-12
: LGTM: Proper handling of timing item conversion.The change to use
elementary.agate_to_dict(item)
for timing items is consistent with the standardization approach.
23-23
: LGTM: Correct conditional conversion for node objects.The conditional use of
elementary.agate_to_dict(run_result.node)
when the node exists maintains the existing logic while using the standardized conversion method.integration_tests/tests/dbt_project.py (3)
11-11
: LGTM: Proper import addition for RunnerMethod.The import of
RunnerMethod
is correctly added to support the new runner method parameter.
33-42
: LGTM: Well-structured function signature update.The addition of the optional
runner_method
parameter toget_dbt_runner
is properly typed and maintains backward compatibility. The parameter is correctly passed through to thecreate_dbt_runner
call.
46-52
: LGTM: Consistent constructor parameter propagation.The
DbtProject
constructor properly accepts and propagates therunner_method
parameter to theget_dbt_runner
call, maintaining the parameter flow throughout the test infrastructure.macros/edr/dbt_artifacts/upload_dbt_invocation.sql (1)
131-131
: LGTM: Consistent agate_to_dict usage in get_all_vars.The replacement of
config.vars.to_dict()
withelementary.agate_to_dict(config.vars)
follows the standardized pattern for agate table conversions and ensures compatibility with dbt Fusion environments.macros/utils/run_queries/agate_to_dicts.sql (3)
2-7
: LGTM: Well-implemented conditional logic for dbt Fusion support.The conditional logic properly handles the structural differences between dbt Fusion and regular dbt environments. In Fusion environments,
agate_table
is used directly, while in regular environmentsagate_table.rows
is accessed. This approach ensures compatibility across different dbt runtime environments.
14-14
: LGTM: Improved column value serialization.The use of
elementary.agate_val_serialize(col_value)
provides better handling of column value serialization, particularly for date-like objects that need ISO 8601 formatting.
23-28
: LGTM: Useful helper macro for value serialization.The
agate_val_serialize
macro provides proper serialization for date-like objects by converting them to ISO 8601 format when theyear
attribute is defined. This is a clean and efficient approach to handle date serialization.integration_tests/tests/conftest.py (6)
4-4
: LGTM: Proper imports for new functionality.The additions of
Optional
,yaml
,RunnerMethod
, andDBT_FUSION_SUPPORTED_TARGETS
provide the necessary imports and constants for the enhanced test infrastructure.Also applies to: 7-7, 10-10, 16-16
25-25
: LGTM: Command-line option for runner method selection.The addition of
--runner-method
command-line option enables users to specify the runner method for tests, providing flexibility in test execution.
29-29
: LGTM: Well-structured project setup modifications.The
project_dir_copy
fixture properly handles the runner method parameter and calls helper functions to modify the test project setup. The integration is clean and maintains the existing fixture structure.Also applies to: 38-41
47-76
: LGTM: Comprehensive helper functions for project modification.The helper functions
_edit_packages_yml_to_include_absolute_elementary_package_path
and_remove_python_tests_for_dbt_fusion
are well-implemented:
- The packages.yml editing ensures proper package path resolution
- The Python test removal for dbt Fusion is appropriate since Fusion may not support Python models
- Both functions include proper logging and error handling
79-86
: LGTM: Proper fixture parameter propagation.The updates to
init_tests_env
anddbt_project
fixtures correctly accept and propagate therunner_method
parameter, ensuring consistent handling throughout the test infrastructure.Also applies to: 132-135
153-164
: LGTM: Robust validation for dbt Fusion target compatibility.The
runner_method
fixture properly validates that dbt Fusion is only used with supported targets (snowflake, bigquery, databricks), raising a clear error message for unsupported combinations. This prevents runtime issues and provides clear feedback to users.
packages: | ||
- package: dbt-labs/dbt_utils | ||
name: dbt_utils | ||
version: 1.3.0 | ||
sha1_hash: 5d0d99364ad5d70d392af7d9e00c0f64d31c781b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filename is non-standard for dbt – likely ignored
dbt expects a lockfile named packages.lock
(no extension).
With the current package-lock.yml
name, dbt deps
will ignore the lock and may pull newer, un-pinned versions.
Action:
- Rename the file to
packages.lock
. - Remove the
.yml
extension (dbt looks forpackages.lock
).
No content change needed.
🤖 Prompt for AI Agents
In package-lock.yml lines 1 to 5, the filename is non-standard for dbt and will
be ignored. Rename the file from package-lock.yml to packages.lock by removing
the .yml extension so that dbt recognizes and uses the lockfile correctly. No
changes to the file content are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this is correct
See here - https://docs.getdbt.com/reference/commands/deps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
@haritamar You're absolutely right, and I apologize for the incorrect information. After checking the dbt documentation, package-lock.yml
is indeed the correct filename for dbt's lockfile. The dbt deps
command generates and uses package-lock.yml
to lock dependency versions.
Thank you for the correction and providing the documentation reference. The filename is standard and appropriate for dbt projects.
🌐 Web query:
dbt deps command lockfile filename convention package-lock.yml packages.lock
💡 Result:
The dbt deps command is responsible for managing and pinning external package dependencies in a dbt project by consulting the project's packages.yml
or dependencies.yml
file. When you run dbt deps
, dbt performs dependency resolution and creates or updates a lock file in the project root that ensures reproducible environments.
Lockfile Filename Convention in dbt
- The canonical lockfile name used by dbt is
package-lock.yml
, which is created in the root of your dbt project, alongsidepackages.yml
[1][4][9]. - This file records the exact resolved versions and, for git-based dependencies, the specific commit SHAs of installed packages, ensuring repeatable installs across all environments[1][4][5].
- This approach mirrors the deterministic dependency management seen in other ecosystems (e.g.,
package-lock.json
for npm,Pipfile.lock
for pipenv). dbt teams are encouraged to version-control thepackage-lock.yml
file to guarantee consistent environments for all users and CI processes[1][6].
Other Lockfile Names Observed
- Some tutorials and tools in the dbt ecosystem reference a
packages.lock
orpackages-lock.yml
file[3][8]. - However, as of dbt v1.7 and onward,
package-lock.yml
is the official and documented filename for the lock file, notpackages.lock
orpackages-lock.yml
[1][4][9]. - Historical or third-party documentation may use or mention
packages.lock
or evenpackages-lock.yml
as a tentative filename, but these are not standard in up-to-date, core dbt usage[8].
Summary Table
File name | Status | Context / Notes |
---|---|---|
package-lock.yml | Official, Current | Used by dbt to store locked package specs[1][4][9]. Reproducible, recommended for version control. |
packages.lock | Non-canonical | Sometimes referenced in tutorials or older docs[3]. Not standard in core dbt. |
packages-lock.yml | Deprecated/Test | Used experimentally or mentioned in GitHub issues[8]. Not standard. |
Conclusion:
Always use and expect package-lock.yml
as the lockfile created and updated by the dbt deps
command in current dbt versions. This is the file that guarantees deterministic dependency resolution for your project[1][4][9].
If you see packages.lock
or packages-lock.yml
in other documentation or tooling, treat it as outdated, non-standard, or ecosystem-specific—they are not used by default by dbt-core as of v1.7 and later.
Citations:
- 1: https://docs.getdbt.com/reference/commands/deps
- 2: [Impl] Include each package's project
name
inpackage-lock.yml
dbt-labs/dbt-core#11487 - 3: https://www.getgalaxy.io/learn/glossary/dbt-deps-install-and-update-dbt-package-dependencies
- 4: https://docs.getdbt.com/docs/build/packages
- 5: https://discourse.getdbt.com/t/the-dbt-deps-fails-to-update-packages-in-gitlab-when-branch-is-updated-because-of-package-lock-yml/13182
- 6: https://www.getgalaxy.io/learn/glossary/understanding-the-dbt-deps-command
- 7: https://docs.paradime.io/app-help/concepts/dbt-fundamentals/configuring-your-dbt-project/packages
- 8: [CT-1845] [Feature] Write a file of installed versions during
dbt deps
dbt-labs/dbt-core#6643 - 9: https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.7
- 10: https://hevodata.com/data-transformation/dbt-deps/
🌐 Web query:
dbt package lock file naming convention official documentation
💡 Result:
In dbt, the package lock file is named package-lock.yml
. This file records the exact versions of all packages installed in your project, ensuring consistent and repeatable installations across different environments. (docs.getdbt.com)
The package-lock.yml
file is automatically generated when you run the dbt deps
command, which installs the packages specified in your packages.yml
file. This lock file includes detailed information about each package, such as its name, version, and a unique SHA1 hash that represents the exact state of all your dependencies combined. (docs.lightdash.com)
To maintain consistency across all environments and for all developers, it's recommended to commit the package-lock.yml
file to your version control system. This practice ensures that the exact same dependency versions are installed, even if newer versions of those packages have been released. (docs.getdbt.com)
For more detailed information on managing package dependencies and the package-lock.yml
file in dbt, you can refer to the official dbt documentation. (docs.getdbt.com)
Citations:
- 1: https://docs.getdbt.com/reference/commands/deps?utm_source=openai
- 2: https://docs.lightdash.com/dbt-guides/dbt-package-lock?utm_source=openai
- 3: https://docs.getdbt.com/reference/commands/deps?utm_source=openai
- 4: https://docs.getdbt.com/reference/commands/deps?utm_source=openai
@haritamar You're absolutely right, and I apologize for the incorrect information. Based on the official dbt documentation, package-lock.yml
is indeed the correct and canonical filename for dbt's lockfile.
The dbt deps
command generates and uses package-lock.yml
to lock dependency versions, ensuring reproducible environments across different setups. This file should be committed to version control to guarantee consistent dependency resolution.
Thank you for the correction and providing the documentation reference. The filename in your PR is standard and appropriate for dbt projects.
✏️ Learnings added
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: package-lock.yml:1-5
Timestamp: 2025-07-15T19:28:12.707Z
Learning: The correct filename for dbt's lockfile is `package-lock.yml`, not `packages.lock`. The `dbt deps` command generates and uses `package-lock.yml` to lock dependency versions.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: package-lock.yml:1-5
Timestamp: 2025-07-15T19:28:12.707Z
Learning: The correct filename for dbt's lockfile is `package-lock.yml`, not `packages.lock`. The `dbt deps` command generates and uses `package-lock.yml` to lock dependency versions. This is the official and documented filename as of dbt v1.7 and onward.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
{% macro agate_to_dict(agate_table) %} | ||
{% if elementary.is_dbt_fusion() %} | ||
{% do return(agate_table) %} | ||
{% endif %} | ||
|
||
{% do return(agate_table.to_dict()) %} | ||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider the return type inconsistency
The macro returns different types depending on the environment: the original agate_table
in fusion mode and a dictionary in non-fusion mode. This could cause issues if callers expect consistent return types.
Consider:
- Updating the macro name to reflect its conditional behavior (e.g.,
agate_table_or_dict
) - Adding documentation about the return type variation
- Ensuring all callers handle both return types appropriately
🤖 Prompt for AI Agents
In macros/utils/run_queries/agate_to_dict.sql around lines 1 to 7, the macro
returns different types based on the environment: the original agate_table in
fusion mode and a dictionary otherwise, causing inconsistency. Rename the macro
to reflect this behavior, such as agate_table_or_dict, add clear documentation
explaining the conditional return types, and review all callers to ensure they
correctly handle both possible return types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fusion / non-fusion behavior is intentionally done to handle breaking changes in Fusion (What agate_table contains is different)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haritamar Thank you for the clarification! I understand now that the conditional behavior is intentional to handle the breaking changes in dbt Fusion where the structure of agate_table
differs between modes. This makes the current implementation the correct approach for maintaining compatibility across both environments.
✏️ Learnings added
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.530Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
config: | ||
meta: | ||
referenced_columns: | ||
- column_name: "order_id" | ||
data_type: "string" | ||
- column_name: "ZOMG" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same schema issue for the orders
exposure + missing context for the second column
config.meta
is invalid here for the same reason noted above.- The second entry (
"ZOMG"
) lacksdata_type
andnode
, making the metadata inconsistent.
- config:
- meta:
- referenced_columns:
+ meta:
+ referenced_columns:
- column_name: "order_id"
data_type: "string"
- - column_name: "ZOMG"
+ - column_name: "ZOMG"
+ # TODO: add data_type and node
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
config: | |
meta: | |
referenced_columns: | |
- column_name: "order_id" | |
data_type: "string" | |
- column_name: "ZOMG" | |
meta: | |
referenced_columns: | |
- column_name: "order_id" | |
data_type: "string" | |
- column_name: "ZOMG" | |
# TODO: add data_type and node |
🤖 Prompt for AI Agents
In integration_tests/dbt_project/models/exposures.yml around lines 39 to 44, the
config.meta section is invalid and the second column entry "ZOMG" is missing
required fields. Remove or correct the config.meta usage to match the valid
schema, and for the "ZOMG" column, add the missing data_type and node fields to
ensure consistent and complete metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
@haritamar You're right to question this. Let me verify the current dbt schema for exposures to provide accurate guidance.
🌐 Web query:
dbt exposures config meta schema documentation latest version
💡 Result:
In dbt, the meta
field is used to attach metadata to resources such as models, exposures, and sources. This metadata is compiled into the manifest.json
file and is viewable in the auto-generated documentation. (docs.getdbt.com)
Defining meta
in Exposures:
Exposures represent downstream uses of your dbt project, such as dashboards or applications. To define an exposure and include metadata, you can create a properties.yml
file in your project's directory. Here's an example of how to define an exposure with metadata:
version: 2
exposures:
- name: weekly_jaffle_metrics
label: Jaffles by the Week
type: dashboard
maturity: high
url: https://bi.tool/dashboards/1
description: >
Did someone say "exponential growth"?
depends_on:
- ref('fct_orders')
- ref('dim_customers')
- source('gsheets', 'goals')
- metric('count_orders')
owner:
name: Callum McData
email: [email protected]
config:
meta:
custom_key: custom_value
In this example:
- The
meta
field is defined under theconfig
property of the exposure. - You can include any key-value pairs within the
meta
field to store custom metadata.
For more details on defining exposures and their properties, refer to the dbt documentation on exposures. (docs.getdbt.com)
Defining meta
in Models:
Similarly, you can define metadata for models in your project. This can be done in the dbt_project.yml
file or within a properties.yml
file. Here's an example of defining metadata for a model in the dbt_project.yml
file:
models:
my_project:
my_model:
+meta:
custom_key: custom_value
Alternatively, you can define metadata within a properties.yml
file:
version: 2
models:
- name: my_model
description: "This is my model"
config:
meta:
custom_key: custom_value
For more information on defining metadata for models, refer to the dbt documentation on resource configurations. (docs.getdbt.com)
By utilizing the meta
field, you can enrich your dbt project's documentation with additional context and information, making it more informative for users and stakeholders.
Citations:
- 1: https://docs.getdbt.com/reference/resource-configs/meta?utm_source=openai
- 2: https://docs.getdbt.com/docs/build/exposures?utm_source=openai
- 3: https://docs.getdbt.com/reference/resource-configs/meta?utm_source=openai
@haritamar You're absolutely correct, and I apologize for the error. According to the official dbt documentation, config.meta
is indeed the correct way to define metadata for exposures. The change shown in the diff is actually following the proper dbt schema.
The example from the documentation shows:
exposures:
- name: weekly_jaffle_metrics
config:
meta:
custom_key: custom_value
So the move from meta:
directly under the exposure to config.meta:
is the right approach. Thank you for the correction!
✏️ Learnings added
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:39-44
Timestamp: 2025-07-15T19:30:40.145Z
Learning: In dbt exposures, metadata should be defined under `config.meta` not directly under `meta`. The correct schema for exposures is to nest the meta field under config according to official dbt documentation.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
config: | ||
meta: | ||
referenced_columns: | ||
- column_name: id | ||
data_type: numeric | ||
node: ref('customers') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
config
block is not valid for exposures – metadata will be ignored by dbt
exposures
YAML spec only supports a top-level meta
key. Nesting it under config
will cause dbt to silently ignore the entire metadata block, so referenced_columns
will no longer be available.
- config:
- meta:
- referenced_columns:
+ meta:
+ referenced_columns:
- column_name: id
data_type: numeric
node: ref('customers')
Please run dbt parse
locally – it should surface a schema validation error confirming this.
🌐 Web query:
Does the dbt exposures YAML schema support a nested `config: meta:` block, or should all metadata for exposures be defined under a top-level `meta` key according to the official documentation?
💡 Result:
According to the official dbt documentation, all metadata for exposures should be defined under a top-level meta
key, rather than a nested config: meta:
block.
Here’s what the documentation says about exposure YAML structure:
- Exposures are defined in YAML under the
exposures:
key[1][3][5]. - Starting with dbt v1.10,
tags
andmeta
for exposures are wrapped inside aconfig
block (i.e.,config: meta: {...}
)[1], but official syntax examples and guides show onlymeta:
at the first level under an exposure, rather than deep insideconfig:
.
The correct and supported way to add metadata is:
exposures:
- name: my_exposure
...
meta:
my_key: my_value
You can add metadata for your exposure and reference it in documentation pages by adding any top-level meta property to the exposure[10].
On config: meta:
for exposures
- While some other dbt resources (models, seeds, etc.) support
config: meta:
blocks, the exposures YAML schema expectsmeta
to be top-level for each exposure[2][10]. - The YAML validation for exposures does not document or support a nested structure like:
exposures: - name: my_exposure config: meta: key: value
- In v1.10, a new
config:
block was introduced for exposures, with support fortags
andmeta
inside it[1]. However, the official examples and documentation continue to recommend using top-levelmeta
in practice, whileconfig
is most often used forenabled
ortags
. There is ambiguity in some official changelogs, but current practice—especially as seen in the latest dbt docs and usage guides—is to keepmeta
at the top level for exposures[1][10].
Summary table
Placement | Supported for exposures? |
---|---|
meta: (top-level) |
Yes (officially documented) |
config: meta: |
Not documented/supported |
Best practice:
Define all exposure metadata under a top-level meta
key inside each exposure definition, not inside a nested config: meta:
block[1][10]. This will ensure compatibility with dbt's official validation and documentation generation.
Citations:
- 1: https://docs.getdbt.com/reference/exposure-properties
- 2: https://docs.getdbt.com/reference/resource-configs/meta
- 3: https://docs.getdbt.com/docs/build/exposures
- 4: https://hevodata.com/data-transformation/dbt-exposure/
- 5: https://www.datafold.com/blog/dbt-exposures
- 6: https://docs.getdbt.com/reference/configs-and-properties
- 7: https://popsql.com/learn-dbt/dbt-exposures
- 8: https://cve.mitre.org/cgi-bin/cvekey.cgi
- 9: https://blogs.perficient.com/2025/06/12/yaml-files-in-dbt/
- 10: https://docs.elementary-data.com/data-tests/schema-tests/exposure-tests
Ensure exposure metadata is declared at the top level
dbt’s exposures schema only recognizes a top-level meta
block—any metadata nested under config:
will be ignored.
• File: integration_tests/dbt_project/models/exposures.yml
Lines 18–24
- config:
- meta:
- referenced_columns:
+ meta:
+ referenced_columns:
- column_name: id
data_type: numeric
node: ref('customers')
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
config: | |
meta: | |
referenced_columns: | |
- column_name: id | |
data_type: numeric | |
node: ref('customers') | |
meta: | |
referenced_columns: | |
- column_name: id | |
data_type: numeric | |
node: ref('customers') |
🤖 Prompt for AI Agents
In integration_tests/dbt_project/models/exposures.yml around lines 18 to 24, the
meta block is incorrectly nested under config, which dbt does not recognize.
Move the entire meta block out from under config to be a top-level key in the
exposures.yml file, ensuring it is aligned with config rather than nested inside
it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this is correct - see here - https://docs.getdbt.com/reference/resource-configs/meta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
@haritamar Thank you for the correction and the documentation link. You're absolutely right - I was incorrect in my analysis.
After reviewing the official dbt documentation you provided, I can see that the meta
configuration can indeed be nested under the config
block for exposures, and this is a valid and supported pattern in modern dbt versions.
The current structure in the file:
config:
meta:
referenced_columns:
- column_name: id
data_type: numeric
node: ref('customers')
Is correct and follows the proper dbt schema. My initial assessment was based on outdated information. I apologize for the confusion.
🏁 Script executed:
#!/bin/bash
# Fetch the URL the user provided to see what it says about meta configuration
curl -s "https://docs.getdbt.com/reference/resource-configs/meta" | grep -A 10 -B 10 "exposures\|config.*meta"
Length of output: 120974
🌐 Web query:
site:docs.getdbt.com dbt exposures meta configuration config block
💡 Result:
In dbt, the meta
configuration is used to add metadata to your resources, such as models, sources, and exposures. This metadata is compiled into the manifest.json
file and is viewable in the auto-generated documentation. (docs.getdbt.com)
Defining meta
in a Model's Configuration:
You can define the meta
configuration within a model's config
block in its .sql
file:
{{ config(
meta={
'owner': '@alice',
'model_maturity': 'in dev'
}
) }}
-- Your model's SQL code here
Alternatively, you can define meta
in the schema.yml
file associated with your model:
version: 2
models:
- name: model_name
description: "Description of the model"
config:
meta:
owner: "@alice"
model_maturity: in dev
Defining meta
in an Exposure:
Exposures represent downstream uses of your dbt models, such as dashboards or applications. You can define exposures in a .yml
file under the exposures:
key:
version: 2
exposures:
- name: weekly_jaffle_metrics
label: "Jaffles by the Week"
type: dashboard
maturity: high
url: https://bi.tool/dashboards/1
description: >
Did someone say "exponential growth"?
depends_on:
- ref('fct_orders')
- ref('dim_customers')
- source('gsheets', 'goals')
- metric('count_orders')
owner:
name: Callum McData
email: [email protected]
config:
meta:
owner: "@alice"
model_maturity: in dev
In this example, the meta
configuration is included under the config
block within the exposure definition. (docs.getdbt.com)
Defining meta
in dbt_project.yml
:
You can also define meta
at the project level in the dbt_project.yml
file:
models:
jaffle_shop:
+meta:
owner: "@alice"
favorite_color: red
This configuration assigns the owner
and favorite_color
metadata to all models within the jaffle_shop
project. (docs.getdbt.com)
By utilizing the meta
configuration, you can enrich your dbt project's documentation and provide valuable context to your team and stakeholders.
Citations:
- 1: https://docs.getdbt.com/reference/resource-configs/meta?utm_source=openai
- 2: https://docs.getdbt.com/reference/exposure-properties?utm_source=openai
- 3: https://docs.getdbt.com/reference/resource-configs/meta?utm_source=openai
🌐 Web query:
dbt exposures YAML schema meta config block official documentation
💡 Result:
In dbt, the meta
configuration block within an exposure's YAML definition allows you to attach metadata to the exposure, which is compiled into the manifest.json
file and displayed in the auto-generated documentation. This metadata can include any key-value pairs that provide additional context about the exposure.
Defining the meta
Block in an Exposure:
To include a meta
block in your exposure definition, you can add it under the config
property within your exposure's YAML file. Here's an example:
version: 2
exposures:
- name: weekly_jaffle_metrics
label: Jaffles by the Week
type: dashboard
maturity: high
url: https://bi.tool/dashboards/1
description: >
Did someone say "exponential growth"?
depends_on:
- ref('fct_orders')
- ref('dim_customers')
- source('gsheets', 'goals')
- metric('count_orders')
owner:
name: Callum McData
email: [email protected]
config:
meta:
owner: "@alice"
model_maturity: in dev
In this example:
- The
meta
block is defined under theconfig
property of the exposure. - It includes two key-value pairs:
owner
andmodel_maturity
.
Accessing the meta
Information:
The metadata defined in the meta
block will be available in the manifest.json
file generated by dbt. This file can be used to programmatically access the metadata or to enhance the auto-generated documentation.
For more detailed information on defining exposures and using the meta
configuration, refer to the official dbt documentation on exposures and resource configurations. (docs.getdbt.com, docs.getdbt.com)
Citations:
- 1: https://docs.getdbt.com/docs/building-a-dbt-project/exposures?utm_source=openai
- 2: https://docs.getdbt.com/reference/resource-configs/meta?utm_source=openai
@haritamar You're absolutely right, and I apologize for the incorrect analysis. After reviewing the official dbt documentation you provided, I can confirm that the config: meta:
format is indeed valid and supported for exposures.
The documentation clearly shows that for exposures, the proper format is:
exposures:
- name: exposure_name
config:
meta: {<dictionary>} # changed to config in v1.10
The structure in your file:
config:
meta:
referenced_columns:
- column_name: id
data_type: numeric
node: ref('customers')
Is correct according to the official dbt documentation. Thank you for the correction and for pointing me to the authoritative source.
✏️ Learnings added
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.655Z
Learning: For dbt exposures, the `meta` configuration can be nested under the `config` block (i.e., `config: meta:`). This is a valid and supported pattern in modern dbt versions, contrary to some older documentation that suggested only top-level `meta` blocks were supported.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.655Z
Learning: For dbt exposures in v1.10+, the `meta` configuration should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is the officially documented and supported format for exposures.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
config: | ||
meta: | ||
deprecated_columns: | ||
- name: compiled_sql | ||
data_type: string | ||
description: The compiled SQL executed against the database. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
meta
must sit beside, not inside, config
in model YAML
config
is reserved for run-time configuration (materialized, enabled, etc.). Placing meta
inside it means the deprecated_columns
information is lost.
- config:
- meta:
- deprecated_columns:
+ meta:
+ deprecated_columns:
- name: compiled_sql
data_type: string
description: The compiled SQL executed against the database.
A quick dbt parse
should flag this misplacement.
Move meta.deprecated_columns
out of the config
block
In models/run_results.yml
, the meta
section must be a sibling of config
, not nested within it. Placing it inside config
prevents dbt from reading your metadata.
- File:
models/run_results.yml
, lines 139–145
- config:
- meta:
- deprecated_columns:
+ meta:
+ deprecated_columns:
- name: compiled_sql
data_type: string
description: The compiled SQL executed against the database.
Run dbt parse
afterward to confirm the deprecated_columns
are correctly recognized.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
config: | |
meta: | |
deprecated_columns: | |
- name: compiled_sql | |
data_type: string | |
description: The compiled SQL executed against the database. | |
meta: | |
deprecated_columns: | |
- name: compiled_sql | |
data_type: string | |
description: The compiled SQL executed against the database. |
🤖 Prompt for AI Agents
In models/run_results.yml around lines 139 to 145, the meta.deprecated_columns
section is incorrectly nested inside the config block. Move the entire meta
block, including deprecated_columns, out of the config block so that meta and
config are siblings at the same indentation level. After making this change, run
dbt parse to verify the deprecated_columns metadata is properly recognized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, not correct, and in dbt-fusion the expectation is for meta to be nested under config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haritamar Thank you for the correction! You're absolutely right. I apologize for the incorrect suggestion - I was applying traditional dbt schema patterns where meta
is typically a sibling of config
.
I understand now that in dbt-fusion, the schema has changed and meta
should indeed be nested under config
for models, consistent with the broader changes in this PR for dbt fusion support. The diff correctly shows this structural change.
/| /|
( :v: )
|(_)|
O_(" ")_O
✏️ Learnings added
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: models/run_results.yml:139-145
Timestamp: 2025-07-15T19:31:54.622Z
Learning: In dbt-fusion (dbt 2.0+), the `meta` configuration for models should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is different from traditional dbt where `meta` was a sibling of `config`.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.655Z
Learning: For dbt exposures, the `meta` configuration can be nested under the `config` block (i.e., `config: meta:`). This is a valid and supported pattern in modern dbt versions, contrary to some older documentation that suggested only top-level `meta` blocks were supported.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:39-44
Timestamp: 2025-07-15T19:30:40.145Z
Learning: In dbt exposures, metadata should be defined under `config.meta` not directly under `meta`. The correct schema for exposures is to nest the meta field under config according to official dbt documentation.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.655Z
Learning: For dbt exposures in v1.10+, the `meta` configuration should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is the officially documented and supported format for exposures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/test-all-warehouses.yml (1)
63-63
: Add adefault
field to the input declaration for consistency
elementary-ref
now has an inline default at the call-site, but its definition under bothworkflow_dispatch
andworkflow_call
still lacks adefault:
key. This creates an implicit contract that isn’t visible to consumers or to workflow-linting tools.Adding an explicit default to the input block keeps the contract self-documented and prevents accidental divergence if someone copies the input list elsewhere.
elementary-ref: type: string required: false + default: ele-4724-dbt_fusion-support description: Branch or tag to checkout for 'elementary' repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (2)
macros/edr/tests/test_column_anomalies.sql (1)
53-72
:min_bucket_start
/max_bucket_end
may be undefined outside thetimestamp_column
branch
min_bucket_start
andmax_bucket_end
are defined only whentest_configuration.timestamp_column
is truthy, yet they are:
- logged on line 68, and
- unconditionally passed to
elementary.column_monitoring_query
on lines 71-78.In runs where
timestamp_column
is empty, Jinja will treat these asUndefined
, which propagates as the string"Undefined"
into SQL and very likely breaks the query.Proposed quick fix:
{% set min_bucket_start = none %} {% set max_bucket_end = none %} {% if test_configuration.timestamp_column %} - {%- set min_bucket_start, max_bucket_end = elementary.get_metric_buckets_min_and_max(… ) %} + {%- set min_bucket_start, max_bucket_end = + elementary.get_metric_buckets_min_and_max(… ) %} {% endif %}This guarantees the variables exist and keeps the downstream call signature intact.
macros/edr/tests/test_table_anomalies.sql (1)
49-72
: Same undefined-variable risk as incolumn_anomalies
min_bucket_start
andmax_bucket_end
are only set whentimestamp_column
is provided but are later logged and injected into the SQL query unconditionally (lines 56, 66-72).Replicate the guard pattern suggested for
column_anomalies
to avoid runtime failures.
🧹 Nitpick comments (4)
macros/edr/tests/test_schema_changes_from_baseline.sql (1)
4-4
: Swappingcontext["model"]
forget_test_model()
– looks goodUsing
elementary.get_test_model()
removes the fragile dependency on the execution-context internals and makes the macro fusion-safe. No further action needed.If you find yourself passing
elementary.get_test_model()
as the second argument in every call toget_model_relation_for_test
, consider defaulting that parameter inside the macro itself to cut the boilerplate across the codebase.Also applies to: 23-23
macros/edr/tests/on_run_end/handle_tests_results.sql (1)
118-119
: Keep boolean literals’ casing consistent across the codebaseIn this file the call is made with the lowercase literal
true
, while other new calls toelementary.edr_create_table_as
(e.g.create_temp_table.sql
, line 11) use the Python-styleTrue
. Both evaluate to a boolean in Jinja, but mixing styles is noisy and can mis-lead about intent. Pick one convention (the codebase already leans towardTrue
) and stick to it.- {%- do elementary.edr_create_table_as(true, temp_relation, test_tables_union_query) %} + {%- do elementary.edr_create_table_as(True, temp_relation, test_tables_union_query) %} … - {%- do elementary.edr_create_table_as(true, temp_relation, test_tables_union_query) %} + {%- do elementary.edr_create_table_as(True, temp_relation, test_tables_union_query) %}Also applies to: 166-167
macros/utils/table_operations/create_temp_table.sql (1)
11-12
: Nice consolidation, minor stylistic pointGreat to see the drop-and-create logic collapsed into
edr_create_table_as
.
Nit: to stay consistent with the rest of the file after you adjust the calls inhandle_tests_results.sql
, use eitherTrue
ortrue
everywhere.No functional issues spotted.
macros/edr/dbt_artifacts/upload_dbt_tests.sql (1)
166-181
:description
is popped fromunified_meta
, altering the storedmeta
After the
.pop('description')
call, the returnedmeta
no longer containsdescription
, whereas upstream pipelines might expect it. Clarify whether this field is intentionally removed or should be preserved in the stored metadata.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
integration_tests/dbt_project/macros/create_all_types_table.sql
(4 hunks)macros/edr/dbt_artifacts/upload_dbt_tests.sql
(3 hunks)macros/edr/system/system_utils/get_test_model.sql
(1 hunks)macros/edr/tests/on_run_end/handle_tests_results.sql
(2 hunks)macros/edr/tests/test_ai_data_validation.sql
(1 hunks)macros/edr/tests/test_all_columns_anomalies.sql
(2 hunks)macros/edr/tests/test_collect_metrics.sql
(1 hunks)macros/edr/tests/test_column_anomalies.sql
(2 hunks)macros/edr/tests/test_dimension_anomalies.sql
(2 hunks)macros/edr/tests/test_event_freshness_anomalies.sql
(1 hunks)macros/edr/tests/test_exposure_schema_validity.sql
(1 hunks)macros/edr/tests/test_schema_changes.sql
(2 hunks)macros/edr/tests/test_schema_changes_from_baseline.sql
(2 hunks)macros/edr/tests/test_table_anomalies.sql
(2 hunks)macros/edr/tests/test_utils/create_elementary_test_table.sql
(1 hunks)macros/edr/tests/test_utils/validate_unique_metric_names.sql
(1 hunks)macros/utils/run_queries/agate_to_dicts.sql
(1 hunks)macros/utils/table_operations/create_or_replace.sql
(1 hunks)macros/utils/table_operations/create_table_as.sql
(1 hunks)macros/utils/table_operations/create_table_like.sql
(1 hunks)macros/utils/table_operations/create_temp_table.sql
(1 hunks)macros/utils/table_operations/replace_table_data.sql
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- macros/edr/system/system_utils/get_test_model.sql
- integration_tests/dbt_project/macros/create_all_types_table.sql
🚧 Files skipped from review as they are similar to previous changes (1)
- macros/utils/run_queries/agate_to_dicts.sql
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: models/run_results.yml:139-145
Timestamp: 2025-07-15T19:31:54.689Z
Learning: In dbt-fusion (dbt 2.0+), the `meta` configuration for models should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is different from traditional dbt where `meta` was a sibling of `config`.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.548Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: package-lock.yml:1-5
Timestamp: 2025-07-15T19:28:12.728Z
Learning: The correct filename for dbt's lockfile is `package-lock.yml`, not `packages.lock`. The `dbt deps` command generates and uses `package-lock.yml` to lock dependency versions. This is the official and documented filename as of dbt v1.7 and onward.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: package-lock.yml:1-5
Timestamp: 2025-07-15T19:28:12.728Z
Learning: The correct filename for dbt's lockfile is `package-lock.yml`, not `packages.lock`. The `dbt deps` command generates and uses `package-lock.yml` to lock dependency versions.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.744Z
Learning: For dbt exposures, the `meta` configuration can be nested under the `config` block (i.e., `config: meta:`). This is a valid and supported pattern in modern dbt versions, contrary to some older documentation that suggested only top-level `meta` blocks were supported.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.744Z
Learning: For dbt exposures in v1.10+, the `meta` configuration should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is the officially documented and supported format for exposures.
macros/edr/dbt_artifacts/upload_dbt_tests.sql (5)
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: models/run_results.yml:139-145
Timestamp: 2025-07-15T19:31:54.689Z
Learning: In dbt-fusion (dbt 2.0+), the `meta` configuration for models should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is different from traditional dbt where `meta` was a sibling of `config`.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.548Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.744Z
Learning: For dbt exposures in v1.10+, the `meta` configuration should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is the officially documented and supported format for exposures.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:39-44
Timestamp: 2025-07-15T19:30:40.238Z
Learning: In dbt exposures, metadata should be defined under `config.meta` not directly under `meta`. The correct schema for exposures is to nest the meta field under config according to official dbt documentation.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.744Z
Learning: For dbt exposures, the `meta` configuration can be nested under the `config` block (i.e., `config: meta:`). This is a valid and supported pattern in modern dbt versions, contrary to some older documentation that suggested only top-level `meta` blocks were supported.
macros/utils/table_operations/replace_table_data.sql (1)
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.548Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.
macros/edr/tests/on_run_end/handle_tests_results.sql (1)
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.548Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.
macros/utils/table_operations/create_or_replace.sql (1)
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.548Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.
macros/utils/table_operations/create_table_as.sql (1)
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.548Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.
🔇 Additional comments (17)
macros/edr/tests/test_utils/validate_unique_metric_names.sql (1)
13-15
: ✅ Consistent test-node retrievalSwitching to
elementary.get_test_model()
keeps the macro Fusion-safe and consistent with the rest of the refactor. Looks good.macros/edr/tests/test_collect_metrics.sql (1)
29-33
: ✅ Aligned with new Fusion helperPassing
elementary.get_test_model()
here mirrors the pattern adopted across the test suite and avoids brittlecontext["model"]
access. No issues spotted.macros/edr/tests/test_ai_data_validation.sql (1)
4-8
: ✅ Fusion-safe model relation lookupThe update to
get_test_model()
maintains compatibility while keeping logic unchanged. Looks correct.macros/edr/tests/test_dimension_anomalies.sql (1)
4-8
: ✅ Correct replacement ofcontext["model"]
Using
elementary.get_test_model()
for the initial relation lookup follows the new convention.macros/utils/table_operations/create_table_like.sql (1)
14-15
: ✅ Unified table-creation pathReplacing the manual
run_query(dbt.create_table_as(...))
call withelementary.edr_create_table_as()
simplifies logic and centralises table-creation concerns. Good move.macros/edr/tests/test_column_anomalies.sql (1)
4-5
: Fusion-safe model resolution – approvedThe switch to
elementary.get_test_model()
is correct and consistent with the new fusion utilities.Also applies to: 95-96
macros/edr/tests/test_table_anomalies.sql (1)
4-5
: Fusion-safe model resolution – approvedGood consistency with the rest of the suite; nothing further.
Also applies to: 85-86
macros/edr/tests/test_event_freshness_anomalies.sql (1)
4-4
: Fusion-safe model lookup – approvedThe change aligns with the new utilities; no issues detected.
macros/edr/tests/test_utils/create_elementary_test_table.sql (1)
4-4
: Verify double-quoting of the identifier
adapter.quote_as_configured()
returns a value already quoted according to adapter settings. Passing this quoted string asidentifier
intodbt.get_or_create_relation
can lead to double-quoting on some adapters (e.g., Snowflake) because the relation factory applies its own quoting during rendering.Run a quick check on a Snowflake dev project to confirm that the generated DDL uses single-quoted identifiers and no duplicate quotes. If duplication occurs, revert to passing the raw identifier and rely on the Relation class for quoting.
macros/edr/tests/test_exposure_schema_validity.sql (1)
19-19
: Good move to central helper – just confirm parity with the old behaviourReplacing
context['model']
withelementary.get_test_model()
is cleaner and prepares the macro for Fusion, nice.
Please sanity-check thatget_test_model()
returns exactly the same object in non-Fusion runs (model name, compiled SQL, refs, etc.) or downstream flattening may diverge.macros/edr/tests/test_schema_changes.sql (2)
4-5
: Consistent use ofget_test_model()
The switch to
elementary.get_test_model()
keeps test-model resolution uniform – looks good.
33-34
: Flattening with the new helperSame note as above – if any custom adapters override
context['model']
, ensureget_test_model()
mirrors that override.macros/edr/tests/test_all_columns_anomalies.sql (2)
4-5
: Updated helper used – LGTMNo further concerns; keeps behaviour aligned with the rest of the suite.
96-97
: Downstream helper call likewise updatedMatches the change above – just keep an eye on identical behaviour across dbt versions.
macros/utils/table_operations/create_or_replace.sql (1)
7-34
: Down-stream macros inherit the portability issue fromedr_create_table_as
All branches now funnel through
elementary.edr_create_table_as
. Until the macro handles adapter nuances (see critical issue above), these calls will surface runtime SQL errors on BigQuery, Postgres, etc.Please verify the full integration test matrix after fixing
edr_create_table_as
.macros/edr/dbt_artifacts/upload_dbt_tests.sql (1)
61-70
: Change in precedence betweenconfig.meta
and top-levelmeta
– double-check intentPreviously
config_meta
overwrote keys inmeta_dict
; nowmeta_dict
overwritesconfig_meta
(update order reversed).
Example: if both blocks setquality_dimension
, the top-level value wins under the new logic.Confirm this matches business rules; otherwise keep prior precedence or deep-merge selectively.
macros/utils/table_operations/replace_table_data.sql (1)
8-8
: LGTM – switch toedr_create_table_as
aligns with the new abstractionAssuming the portability issue in
edr_create_table_as
is resolved, this change keeps the macro concise without altering semantics.
{% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %} | ||
{% set anomalous_rows_sql = elementary.get_anomaly_query(flatten_model) %} | ||
{% do elementary.store_metrics_table_in_cache() %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undefined variable breaks test execution
elementary.get_anomaly_query(flatten_model)
references flatten_model
, which is not defined after the refactor; the variable created above is flattened_test
.
- {% set anomalous_rows_sql = elementary.get_anomaly_query(flatten_model) %}
+ {% set anomalous_rows_sql = elementary.get_anomaly_query(flattened_test) %}
Failure to fix will raise a Jinja compilation error.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
{% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %} | |
{% set anomalous_rows_sql = elementary.get_anomaly_query(flatten_model) %} | |
{% do elementary.store_metrics_table_in_cache() %} | |
{% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %} | |
{% set anomalous_rows_sql = elementary.get_anomaly_query(flattened_test) %} | |
{% do elementary.store_metrics_table_in_cache() %} |
🤖 Prompt for AI Agents
In macros/edr/tests/test_dimension_anomalies.sql around lines 73 to 75, the
variable flatten_model used in elementary.get_anomaly_query(flatten_model) is
undefined due to a refactor; the correct variable name is flattened_test.
Replace flatten_model with flattened_test in the get_anomaly_query call to fix
the Jinja compilation error.
{% if should_commit %} | ||
{% do adapter.commit() %} | ||
{% endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
adapter.commit()
may fail on warehouses without explicit transactions
BigQuery and some Spark connectors error on COMMIT;
. Consider gating should_commit
with an adapter capability check or removing it altogether in favour of dbt.commit()
(which is adapter-aware).
🤖 Prompt for AI Agents
In macros/utils/table_operations/create_table_as.sql around lines 18 to 20, the
use of adapter.commit() can cause errors on warehouses like BigQuery or Spark
that do not support explicit transactions. To fix this, replace the direct call
to adapter.commit() with dbt.commit(), which handles adapter-specific
transaction behavior safely, or add a condition to check if the adapter supports
commits before calling adapter.commit().
{% macro edr_create_table_as(temporary, relation, sql_query, drop_first=false, should_commit=false) %} | ||
{# This macro contains a simplified implementation that replaces our usage of | ||
dbt.create_table_as and serves our needs. | ||
This version also runs the query rather than return the SQL. | ||
#} | ||
|
||
{% if drop_first %} | ||
{% do dbt.drop_relation_if_exists(relation) %} | ||
{% endif %} | ||
|
||
{% set create_query %} | ||
create or replace {% if temporary %} temporary {% endif %} table {{ relation }} | ||
as {{ sql_query }} | ||
{% endset %} | ||
|
||
{% do elementary.run_query(create_query) %} | ||
|
||
{% if should_commit %} | ||
{% do adapter.commit() %} | ||
{% endif %} | ||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CREATE OR REPLACE TEMP TABLE
is not portable – macro needs adapter-specific branching
This handcrafted SQL will break on several adapters:
- BigQuery:
CREATE OR REPLACE TEMP TABLE …
is invalid (onlyCREATE TEMP TABLE …
is allowed). - Redshift / Postgres (pre-15):
OR REPLACE
is not supported; callers rely ondrop_first=true
instead. - Databricks / Spark / ClickHouse: additional clauses (
USING
,ENGINE
, etc.) may be required.
By re-implementing what adapter.dispatch('create_table_as')
already solves, we lose cross-database coverage that dbt gives us for free.
- {% set create_query %}
- create or replace {% if temporary %} temporary {% endif %} table {{ relation }}
- as {{ sql_query }}
- {% endset %}
+ {% if target.type == 'bigquery' and temporary %}
+ {# BigQuery does not allow OR REPLACE on TEMP tables #}
+ {% set create_query %}
+ create temporary table {{ relation }} as {{ sql_query }}
+ {% endset %}
+ {% elif target.type in ['redshift','postgres'] and not adapter.get_relation(database=relation.database,
+ schema=relation.schema,
+ identifier=relation.identifier) %}
+ {# Engines that cannot do OR REPLACE but were dropped earlier #}
+ {% set create_query %}
+ create {% if temporary %} temporary {% endif %} table {{ relation }} as {{ sql_query }}
+ {% endset %}
+ {% else %}
+ {% set create_query %}
+ create or replace {% if temporary %} temporary {% endif %} table {{ relation }} as {{ sql_query }}
+ {% endset %}
+ {% endif %}
Alternatively, delegate back to adapter.dispatch('create_table_as')
and merely wrap the run_query
; this keeps adapter nuances in one place.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In macros/utils/table_operations/create_table_as.sql lines 1 to 21, the macro
uses a handcrafted CREATE OR REPLACE TEMP TABLE statement that is not portable
across different database adapters, causing failures on BigQuery, Redshift,
Postgres, Databricks, and others. To fix this, replace the custom SQL creation
logic with a call to adapter.dispatch('create_table_as') to leverage dbt's
built-in adapter-specific implementations, and then run the resulting query with
elementary.run_query. This preserves cross-database compatibility and
centralizes adapter-specific SQL handling.
Summary by CodeRabbit
New Features
Bug Fixes
Chores
.gitignore
entries to exclude internal package directories.