Skip to content

Ele 4724 dbt fusion support #825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from
Draft

Conversation

haritamar
Copy link
Collaborator

@haritamar haritamar commented Jul 15, 2025

Summary by CodeRabbit

  • New Features

    • Added support for specifying a runner method in test configuration, with validation for supported targets.
    • Introduced macros to detect dbt fusion environments and to convert agate tables conditionally based on environment.
    • Added a macro to check dbt version compatibility for fusion features.
    • Added a macro to provide a consistent test model reference across environments.
  • Bug Fixes

    • Improved handling of agate table conversions for compatibility with dbt fusion.
    • Enhanced serialization of date-like objects in macro outputs.
    • Updated test macros to use the new test model reference for consistent behavior.
  • Chores

    • Updated default workflow inputs and package version constraints.
    • Added new .gitignore entries to exclude internal package directories.
    • Introduced a package lock file for dbt_utils.
    • Refined YAML metadata structure for exposures and models.
    • Removed deprecated configuration options and related macros.
    • Simplified macro invocation style in materializations.
    • Enhanced test project setup to conditionally modify files based on runner method.
    • Updated workflow configurations to use fixed default references.
    • Refactored table creation macros to consolidate logic and improve execution consistency.
    • Replaced multiple macro calls with unified table creation calls for various database platforms.
    • Simplified quoting logic in test table creation macros.
    • Centralized metadata merging in test flattening macros.

Copy link

linear bot commented Jul 15, 2025

@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 17:22 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 17:22 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 17:22 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 17:22 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 17:22 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 17:22 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 17:22 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 17:22 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 17:22 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 17:22 — with GitHub Actions Failure
Copy link

coderabbitai bot commented Jul 15, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes introduce support for dbt Fusion by adding new macros, updating test infrastructure, and modifying macro logic to handle differences between dbt versions. Configuration keys and logic related to materialization overrides are removed or refactored. Several YAML and ignore files are updated, and a package lock file is introduced for dbt_utils versioning.

Changes

File(s) Change Summary
.github/workflows/test-warehouse.yml, .github/workflows/test-all-warehouses.yml Added default refs for workflow inputs (temporary); replaced dynamic input with fixed string in reusable workflow.
.gitignore, integration_tests/dbt_project/.gitignore Ignore dbt_internal_packages/ directory.
integration_tests/dbt_project/dbt_project.yml, integration_tests/deprecated_tests/dbt_project.yml Removed mute_ensure_materialization_override from vars; added seeds.schema in deprecated tests.
integration_tests/dbt_project/macros/materializations.sql, integration_tests/deprecated_tests/macros/system/materializations.sql Replaced .call_macro() with direct macro calls in materialization macros.
integration_tests/dbt_project/models/exposures.yml, models/run_results.yml Moved meta blocks under config for exposures and model metadata.
integration_tests/dbt_project/packages.yml Simplified dbt_utils version spec to a fixed range.
integration_tests/tests/conftest.py, integration_tests/tests/dbt_project.py, integration_tests/tests/env.py Added/propagated runner_method parameter, updated test fixtures and project setup for Fusion support.
macros/edr/dbt_artifacts/upload_dbt_invocation.sql, macros/utils/graph/get_run_result_dict.sql Switched from .to_dict() to elementary.agate_to_dict() for agate table conversion.
macros/edr/system/hooks/on_run_start.sql, macros/edr/system/system_utils/get_config_var.sql Removed ensure_materialize_override macro call and related config key.
macros/edr/system/system_utils/is_dbt_fusion.sql, macros/utils/run_queries/agate_to_dict.sql Added new macros: is_dbt_fusion and agate_to_dict.
macros/utils/run_queries/agate_to_dicts.sql Updated logic for Fusion compatibility, added agate_val_serialize macro.
macros/edr/tests/on_run_start/ensure_materialize_override.sql Deleted macro for ensuring materialization override.
package-lock.yml Added lock file for dbt_utils version 1.3.0.
macros/edr/dbt_artifacts/upload_dbt_tests.sql Unified metadata merging using unified_meta dictionary.
macros/edr/system/system_utils/get_test_model.sql Added macro get_test_model to return the global model for test context.
macros/edr/tests/*.sql (multiple test macros) Replaced context["model"] with call to elementary.get_test_model() for model context in tests.
macros/edr/tests/test_utils/create_elementary_test_table.sql Simplified quoting logic using adapter.quote_as_configured.
macros/edr/tests/test_utils/validate_unique_metric_names.sql Changed test node retrieval to use elementary.get_test_model().
macros/utils/table_operations/*.sql (create_or_replace, create_table_as, create_table_like, create_temp_table, replace_table_data) Refactored table creation macros to use new edr_create_table_as macro consolidating drop, create, commit logic.
macros/utils/table_operations/create_table_as.sql Added new macro edr_create_table_as for simplified table creation with optional drop and commit.
integration_tests/dbt_project/macros/create_all_types_table.sql Replaced multi-step table creation with single call to elementary.edr_create_table_as.
macros/edr/tests/on_run_end/handle_tests_results.sql Replaced dbt.create_table_as calls with elementary.edr_create_table_as.

Sequence Diagram(s)

sequenceDiagram
    participant Pytest as Pytest CLI
    participant Conftest as conftest.py
    participant Env as Environment
    participant DbtProj as DbtProject
    participant Runner as DbtRunner

    Pytest->>Conftest: parse --runner-method option
    Conftest->>Conftest: Validate runner method & target
    Conftest->>Env: init_tests_env(..., runner_method)
    Env->>DbtProj: DbtProject(..., runner_method)
    DbtProj->>Runner: get_dbt_runner(..., runner_method)
    Note right of Runner: Runner adapts logic based on runner_method
Loading
sequenceDiagram
    participant Macro as Macro
    participant Fusion as is_dbt_fusion
    participant Util as agate_to_dict

    Macro->>Fusion: Check if dbt Fusion
    alt Fusion is true
        Macro->>Util: agate_to_dict returns agate_table as is
    else Fusion is false
        Macro->>Util: agate_to_dict returns agate_table.to_dict()
    end
Loading

Poem

A hop, a skip, a Fusion leap,
The macros now are light and neat!
No more overrides to mute or chase,
Agate tables find their place.
With runners swift and configs tight,
This bunny codes through day and night!
🐇✨


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

👋 @haritamar
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e4842d and 3c86bf8.

📒 Files selected for processing (22)
  • .github/workflows/test-warehouse.yml (2 hunks)
  • .gitignore (1 hunks)
  • integration_tests/dbt_project/.gitignore (1 hunks)
  • integration_tests/dbt_project/dbt_project.yml (0 hunks)
  • integration_tests/dbt_project/macros/materializations.sql (1 hunks)
  • integration_tests/dbt_project/models/exposures.yml (2 hunks)
  • integration_tests/dbt_project/packages.yml (1 hunks)
  • integration_tests/deprecated_tests/dbt_project.yml (0 hunks)
  • integration_tests/deprecated_tests/macros/system/materializations.sql (1 hunks)
  • integration_tests/tests/conftest.py (5 hunks)
  • integration_tests/tests/dbt_project.py (2 hunks)
  • integration_tests/tests/env.py (1 hunks)
  • macros/edr/dbt_artifacts/upload_dbt_invocation.sql (1 hunks)
  • macros/edr/system/hooks/on_run_start.sql (0 hunks)
  • macros/edr/system/system_utils/get_config_var.sql (0 hunks)
  • macros/edr/system/system_utils/is_dbt_fusion.sql (1 hunks)
  • macros/edr/tests/on_run_start/ensure_materialize_override.sql (0 hunks)
  • macros/utils/graph/get_run_result_dict.sql (2 hunks)
  • macros/utils/run_queries/agate_to_dict.sql (1 hunks)
  • macros/utils/run_queries/agate_to_dicts.sql (1 hunks)
  • models/run_results.yml (1 hunks)
  • package-lock.yml (1 hunks)
💤 Files with no reviewable changes (5)
  • integration_tests/dbt_project/dbt_project.yml
  • macros/edr/system/hooks/on_run_start.sql
  • integration_tests/deprecated_tests/dbt_project.yml
  • macros/edr/system/system_utils/get_config_var.sql
  • macros/edr/tests/on_run_start/ensure_materialize_override.sql
🧰 Additional context used
🧬 Code Graph Analysis (2)
integration_tests/tests/env.py (2)
integration_tests/tests/conftest.py (3)
  • dbt_project (132-135)
  • target (139-140)
  • runner_method (154-164)
integration_tests/tests/dbt_project.py (1)
  • get_dbt_runner (33-42)
integration_tests/tests/dbt_project.py (1)
integration_tests/tests/conftest.py (2)
  • target (139-140)
  • runner_method (154-164)
🔇 Additional comments (24)
.gitignore (1)

27-27: Addition looks good

Ignoring dbt_internal_packages/ is consistent with dbt-managed dependencies and keeps the repo clean.

integration_tests/dbt_project/.gitignore (1)

3-3: Consistent ignore pattern

Same note as for the root .gitignore – the addition is correct and harmless.

.github/workflows/test-warehouse.yml (1)

24-25: Temporary branch defaults should be removed before merge

Both inputs now default to the feature branch ele-4724-dbt_fusion-support and are tagged # TEMPORARY.
If this hits main, future workflow runs will silently pin to a branch that will eventually be deleted, breaking CI.

Diff to restore the previous behaviour (empty defaults) once the feature is merged:

-        default: ele-4724-dbt_fusion-support # TEMPORARY
+        # default intentionally left blank – callers should provide a ref

Apply the same change for the second occurrence.

Also applies to: 46-47

integration_tests/dbt_project/macros/materializations.sql (1)

3-6: Direct macro calls simplify the materialization – nice cleanup

Switching from .call_macro() to direct calls is clearer and avoids the extra invocation layer. Verified that the call signatures stay the same.

Also applies to: 11-17

integration_tests/deprecated_tests/macros/system/materializations.sql (1)

2-2: LGTM: Simplified macro invocation pattern

The removal of .call_macro() wrappers standardizes the macro calling convention and makes the code more direct and readable.

Also applies to: 6-6

macros/edr/system/system_utils/is_dbt_fusion.sql (1)

1-7: LGTM: Clean implementation for fusion detection

The logic correctly identifies dbt fusion versions by checking if the major version is greater than 1. The implementation assumes dbt_version follows semantic versioning format, which is a reasonable assumption.

integration_tests/tests/env.py (1)

1-2: LGTM: Proper integration of runner_method parameter

The addition of the runner_method parameter with proper typing and default value maintains backward compatibility while enabling the new fusion runner functionality. The parameter is correctly passed through to the dbt runner initialization.

Also applies to: 8-16

integration_tests/dbt_project/packages.yml (1)

4-4: LGTM: Simplified version specification

The change from dynamic Jinja logic to a fixed version range ">=0.8.0", "<2.0.0" simplifies the dependency specification while maintaining compatibility. The actual version pinning is handled by the package-lock.yml file for deterministic builds.

macros/utils/graph/get_run_result_dict.sql (3)

6-6: LGTM: Consistent with agate_to_dict standardization.

The replacement of run_result.to_dict() with elementary.agate_to_dict(run_result) aligns with the broader effort to standardize agate table conversions for dbt Fusion compatibility.


12-12: LGTM: Proper handling of timing item conversion.

The change to use elementary.agate_to_dict(item) for timing items is consistent with the standardization approach.


23-23: LGTM: Correct conditional conversion for node objects.

The conditional use of elementary.agate_to_dict(run_result.node) when the node exists maintains the existing logic while using the standardized conversion method.

integration_tests/tests/dbt_project.py (3)

11-11: LGTM: Proper import addition for RunnerMethod.

The import of RunnerMethod is correctly added to support the new runner method parameter.


33-42: LGTM: Well-structured function signature update.

The addition of the optional runner_method parameter to get_dbt_runner is properly typed and maintains backward compatibility. The parameter is correctly passed through to the create_dbt_runner call.


46-52: LGTM: Consistent constructor parameter propagation.

The DbtProject constructor properly accepts and propagates the runner_method parameter to the get_dbt_runner call, maintaining the parameter flow throughout the test infrastructure.

macros/edr/dbt_artifacts/upload_dbt_invocation.sql (1)

131-131: LGTM: Consistent agate_to_dict usage in get_all_vars.

The replacement of config.vars.to_dict() with elementary.agate_to_dict(config.vars) follows the standardized pattern for agate table conversions and ensures compatibility with dbt Fusion environments.

macros/utils/run_queries/agate_to_dicts.sql (3)

2-7: LGTM: Well-implemented conditional logic for dbt Fusion support.

The conditional logic properly handles the structural differences between dbt Fusion and regular dbt environments. In Fusion environments, agate_table is used directly, while in regular environments agate_table.rows is accessed. This approach ensures compatibility across different dbt runtime environments.


14-14: LGTM: Improved column value serialization.

The use of elementary.agate_val_serialize(col_value) provides better handling of column value serialization, particularly for date-like objects that need ISO 8601 formatting.


23-28: LGTM: Useful helper macro for value serialization.

The agate_val_serialize macro provides proper serialization for date-like objects by converting them to ISO 8601 format when the year attribute is defined. This is a clean and efficient approach to handle date serialization.

integration_tests/tests/conftest.py (6)

4-4: LGTM: Proper imports for new functionality.

The additions of Optional, yaml, RunnerMethod, and DBT_FUSION_SUPPORTED_TARGETS provide the necessary imports and constants for the enhanced test infrastructure.

Also applies to: 7-7, 10-10, 16-16


25-25: LGTM: Command-line option for runner method selection.

The addition of --runner-method command-line option enables users to specify the runner method for tests, providing flexibility in test execution.


29-29: LGTM: Well-structured project setup modifications.

The project_dir_copy fixture properly handles the runner method parameter and calls helper functions to modify the test project setup. The integration is clean and maintains the existing fixture structure.

Also applies to: 38-41


47-76: LGTM: Comprehensive helper functions for project modification.

The helper functions _edit_packages_yml_to_include_absolute_elementary_package_path and _remove_python_tests_for_dbt_fusion are well-implemented:

  • The packages.yml editing ensures proper package path resolution
  • The Python test removal for dbt Fusion is appropriate since Fusion may not support Python models
  • Both functions include proper logging and error handling

79-86: LGTM: Proper fixture parameter propagation.

The updates to init_tests_env and dbt_project fixtures correctly accept and propagate the runner_method parameter, ensuring consistent handling throughout the test infrastructure.

Also applies to: 132-135


153-164: LGTM: Robust validation for dbt Fusion target compatibility.

The runner_method fixture properly validates that dbt Fusion is only used with supported targets (snowflake, bigquery, databricks), raising a clear error message for unsupported combinations. This prevents runtime issues and provides clear feedback to users.

Comment on lines +1 to +5
packages:
- package: dbt-labs/dbt_utils
name: dbt_utils
version: 1.3.0
sha1_hash: 5d0d99364ad5d70d392af7d9e00c0f64d31c781b
Copy link

@coderabbitai coderabbitai bot Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Filename is non-standard for dbt – likely ignored

dbt expects a lockfile named packages.lock (no extension).
With the current package-lock.yml name, dbt deps will ignore the lock and may pull newer, un-pinned versions.

Action:

  1. Rename the file to packages.lock.
  2. Remove the .yml extension (dbt looks for packages.lock).

No content change needed.

🤖 Prompt for AI Agents
In package-lock.yml lines 1 to 5, the filename is non-standard for dbt and will
be ignored. Rename the file from package-lock.yml to packages.lock by removing
the .yml extension so that dbt recognizes and uses the lockfile correctly. No
changes to the file content are needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is correct
See here - https://docs.getdbt.com/reference/commands/deps

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@haritamar You're absolutely right, and I apologize for the incorrect information. After checking the dbt documentation, package-lock.yml is indeed the correct filename for dbt's lockfile. The dbt deps command generates and uses package-lock.yml to lock dependency versions.

Thank you for the correction and providing the documentation reference. The filename is standard and appropriate for dbt projects.


🌐 Web query:

dbt deps command lockfile filename convention package-lock.yml packages.lock

💡 Result:

The dbt deps command is responsible for managing and pinning external package dependencies in a dbt project by consulting the project's packages.yml or dependencies.yml file. When you run dbt deps, dbt performs dependency resolution and creates or updates a lock file in the project root that ensures reproducible environments.

Lockfile Filename Convention in dbt

  • The canonical lockfile name used by dbt is package-lock.yml, which is created in the root of your dbt project, alongside packages.yml[1][4][9].
  • This file records the exact resolved versions and, for git-based dependencies, the specific commit SHAs of installed packages, ensuring repeatable installs across all environments[1][4][5].
  • This approach mirrors the deterministic dependency management seen in other ecosystems (e.g., package-lock.json for npm, Pipfile.lock for pipenv). dbt teams are encouraged to version-control the package-lock.yml file to guarantee consistent environments for all users and CI processes[1][6].

Other Lockfile Names Observed

  • Some tutorials and tools in the dbt ecosystem reference a packages.lock or packages-lock.yml file[3][8].
  • However, as of dbt v1.7 and onward, package-lock.yml is the official and documented filename for the lock file, not packages.lock or packages-lock.yml[1][4][9].
  • Historical or third-party documentation may use or mention packages.lock or even packages-lock.yml as a tentative filename, but these are not standard in up-to-date, core dbt usage[8].

Summary Table

File name Status Context / Notes
package-lock.yml Official, Current Used by dbt to store locked package specs[1][4][9]. Reproducible, recommended for version control.
packages.lock Non-canonical Sometimes referenced in tutorials or older docs[3]. Not standard in core dbt.
packages-lock.yml Deprecated/Test Used experimentally or mentioned in GitHub issues[8]. Not standard.

Conclusion:
Always use and expect package-lock.yml as the lockfile created and updated by the dbt deps command in current dbt versions. This is the file that guarantees deterministic dependency resolution for your project[1][4][9].

If you see packages.lock or packages-lock.yml in other documentation or tooling, treat it as outdated, non-standard, or ecosystem-specific—they are not used by default by dbt-core as of v1.7 and later.

Citations:


🌐 Web query:

dbt package lock file naming convention official documentation

💡 Result:

In dbt, the package lock file is named package-lock.yml. This file records the exact versions of all packages installed in your project, ensuring consistent and repeatable installations across different environments. (docs.getdbt.com)

The package-lock.yml file is automatically generated when you run the dbt deps command, which installs the packages specified in your packages.yml file. This lock file includes detailed information about each package, such as its name, version, and a unique SHA1 hash that represents the exact state of all your dependencies combined. (docs.lightdash.com)

To maintain consistency across all environments and for all developers, it's recommended to commit the package-lock.yml file to your version control system. This practice ensures that the exact same dependency versions are installed, even if newer versions of those packages have been released. (docs.getdbt.com)

For more detailed information on managing package dependencies and the package-lock.yml file in dbt, you can refer to the official dbt documentation. (docs.getdbt.com)

Citations:


@haritamar You're absolutely right, and I apologize for the incorrect information. Based on the official dbt documentation, package-lock.yml is indeed the correct and canonical filename for dbt's lockfile.

The dbt deps command generates and uses package-lock.yml to lock dependency versions, ensuring reproducible environments across different setups. This file should be committed to version control to guarantee consistent dependency resolution.

Thank you for the correction and providing the documentation reference. The filename in your PR is standard and appropriate for dbt projects.


✏️ Learnings added
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: package-lock.yml:1-5
Timestamp: 2025-07-15T19:28:12.707Z
Learning: The correct filename for dbt's lockfile is `package-lock.yml`, not `packages.lock`. The `dbt deps` command generates and uses `package-lock.yml` to lock dependency versions.

Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: package-lock.yml:1-5
Timestamp: 2025-07-15T19:28:12.707Z
Learning: The correct filename for dbt's lockfile is `package-lock.yml`, not `packages.lock`. The `dbt deps` command generates and uses `package-lock.yml` to lock dependency versions. This is the official and documented filename as of dbt v1.7 and onward.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +1 to +7
{% macro agate_to_dict(agate_table) %}
{% if elementary.is_dbt_fusion() %}
{% do return(agate_table) %}
{% endif %}

{% do return(agate_table.to_dict()) %}
{% endmacro %}
Copy link

@coderabbitai coderabbitai bot Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider the return type inconsistency

The macro returns different types depending on the environment: the original agate_table in fusion mode and a dictionary in non-fusion mode. This could cause issues if callers expect consistent return types.

Consider:

  1. Updating the macro name to reflect its conditional behavior (e.g., agate_table_or_dict)
  2. Adding documentation about the return type variation
  3. Ensuring all callers handle both return types appropriately
🤖 Prompt for AI Agents
In macros/utils/run_queries/agate_to_dict.sql around lines 1 to 7, the macro
returns different types based on the environment: the original agate_table in
fusion mode and a dictionary otherwise, causing inconsistency. Rename the macro
to reflect this behavior, such as agate_table_or_dict, add clear documentation
explaining the conditional return types, and review all callers to ensure they
correctly handle both possible return types.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fusion / non-fusion behavior is intentionally done to handle breaking changes in Fusion (What agate_table contains is different)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haritamar Thank you for the clarification! I understand now that the conditional behavior is intentional to handle the breaking changes in dbt Fusion where the structure of agate_table differs between modes. This makes the current implementation the correct approach for maintaining compatibility across both environments.


✏️ Learnings added
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.530Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +39 to +44
config:
meta:
referenced_columns:
- column_name: "order_id"
data_type: "string"
- column_name: "ZOMG"
Copy link

@coderabbitai coderabbitai bot Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Same schema issue for the orders exposure + missing context for the second column

  1. config.meta is invalid here for the same reason noted above.
  2. The second entry ("ZOMG") lacks data_type and node, making the metadata inconsistent.
-    config:
-      meta:
-        referenced_columns:
+    meta:
+      referenced_columns:
         - column_name: "order_id"
           data_type: "string"
-        - column_name: "ZOMG"
+        - column_name: "ZOMG"
+          # TODO: add data_type and node
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
config:
meta:
referenced_columns:
- column_name: "order_id"
data_type: "string"
- column_name: "ZOMG"
meta:
referenced_columns:
- column_name: "order_id"
data_type: "string"
- column_name: "ZOMG"
# TODO: add data_type and node
🤖 Prompt for AI Agents
In integration_tests/dbt_project/models/exposures.yml around lines 39 to 44, the
config.meta section is invalid and the second column entry "ZOMG" is missing
required fields. Remove or correct the config.meta usage to match the valid
schema, and for the "ZOMG" column, add the missing data_type and node fields to
ensure consistent and complete metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is correct

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@haritamar You're right to question this. Let me verify the current dbt schema for exposures to provide accurate guidance.


🌐 Web query:

dbt exposures config meta schema documentation latest version

💡 Result:

In dbt, the meta field is used to attach metadata to resources such as models, exposures, and sources. This metadata is compiled into the manifest.json file and is viewable in the auto-generated documentation. (docs.getdbt.com)

Defining meta in Exposures:

Exposures represent downstream uses of your dbt project, such as dashboards or applications. To define an exposure and include metadata, you can create a properties.yml file in your project's directory. Here's an example of how to define an exposure with metadata:

version: 2
exposures:
  - name: weekly_jaffle_metrics
    label: Jaffles by the Week
    type: dashboard
    maturity: high
    url: https://bi.tool/dashboards/1
    description: >
      Did someone say "exponential growth"?
    depends_on:
      - ref('fct_orders')
      - ref('dim_customers')
      - source('gsheets', 'goals')
      - metric('count_orders')
    owner:
      name: Callum McData
      email: [email protected]
    config:
      meta:
        custom_key: custom_value

In this example:

  • The meta field is defined under the config property of the exposure.
  • You can include any key-value pairs within the meta field to store custom metadata.

For more details on defining exposures and their properties, refer to the dbt documentation on exposures. (docs.getdbt.com)

Defining meta in Models:

Similarly, you can define metadata for models in your project. This can be done in the dbt_project.yml file or within a properties.yml file. Here's an example of defining metadata for a model in the dbt_project.yml file:

models:
  my_project:
    my_model:
      +meta:
        custom_key: custom_value

Alternatively, you can define metadata within a properties.yml file:

version: 2
models:
  - name: my_model
    description: "This is my model"
    config:
      meta:
        custom_key: custom_value

For more information on defining metadata for models, refer to the dbt documentation on resource configurations. (docs.getdbt.com)

By utilizing the meta field, you can enrich your dbt project's documentation with additional context and information, making it more informative for users and stakeholders.

Citations:


@haritamar You're absolutely correct, and I apologize for the error. According to the official dbt documentation, config.meta is indeed the correct way to define metadata for exposures. The change shown in the diff is actually following the proper dbt schema.

The example from the documentation shows:

exposures:
  - name: weekly_jaffle_metrics
    config:
      meta:
        custom_key: custom_value

So the move from meta: directly under the exposure to config.meta: is the right approach. Thank you for the correction!


✏️ Learnings added
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:39-44
Timestamp: 2025-07-15T19:30:40.145Z
Learning: In dbt exposures, metadata should be defined under `config.meta` not directly under `meta`. The correct schema for exposures is to nest the meta field under config according to official dbt documentation.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +18 to 24
config:
meta:
referenced_columns:
- column_name: id
data_type: numeric
node: ref('customers')

Copy link

@coderabbitai coderabbitai bot Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

config block is not valid for exposures – metadata will be ignored by dbt

exposures YAML spec only supports a top-level meta key. Nesting it under config will cause dbt to silently ignore the entire metadata block, so referenced_columns will no longer be available.

-    config:
-      meta:
-        referenced_columns:
+    meta:
+      referenced_columns:
         - column_name: id
           data_type: numeric
           node: ref('customers')

Please run dbt parse locally – it should surface a schema validation error confirming this.


🌐 Web query:

Does the dbt exposures YAML schema support a nested `config: meta:` block, or should all metadata for exposures be defined under a top-level `meta` key according to the official documentation?

💡 Result:

According to the official dbt documentation, all metadata for exposures should be defined under a top-level meta key, rather than a nested config: meta: block.

Here’s what the documentation says about exposure YAML structure:

  • Exposures are defined in YAML under the exposures: key[1][3][5].
  • Starting with dbt v1.10, tags and meta for exposures are wrapped inside a config block (i.e., config: meta: {...})[1], but official syntax examples and guides show only meta: at the first level under an exposure, rather than deep inside config:.

The correct and supported way to add metadata is:

exposures:
  - name: my_exposure
    ...
    meta:
      my_key: my_value

You can add metadata for your exposure and reference it in documentation pages by adding any top-level meta property to the exposure[10].

On config: meta: for exposures

  • While some other dbt resources (models, seeds, etc.) support config: meta: blocks, the exposures YAML schema expects meta to be top-level for each exposure[2][10].
  • The YAML validation for exposures does not document or support a nested structure like:
    exposures:
      - name: my_exposure
        config:
          meta:
            key: value
  • In v1.10, a new config: block was introduced for exposures, with support for tags and meta inside it[1]. However, the official examples and documentation continue to recommend using top-level meta in practice, while config is most often used for enabled or tags. There is ambiguity in some official changelogs, but current practice—especially as seen in the latest dbt docs and usage guides—is to keep meta at the top level for exposures[1][10].

Summary table

Placement Supported for exposures?
meta: (top-level) Yes (officially documented)
config: meta: Not documented/supported

Best practice:
Define all exposure metadata under a top-level meta key inside each exposure definition, not inside a nested config: meta: block[1][10]. This will ensure compatibility with dbt's official validation and documentation generation.

Citations:


Ensure exposure metadata is declared at the top level

dbt’s exposures schema only recognizes a top-level meta block—any metadata nested under config: will be ignored.

• File: integration_tests/dbt_project/models/exposures.yml
Lines 18–24

-    config:
-      meta:
-        referenced_columns:
+    meta:
+      referenced_columns:
         - column_name: id
           data_type: numeric
           node: ref('customers')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
config:
meta:
referenced_columns:
- column_name: id
data_type: numeric
node: ref('customers')
meta:
referenced_columns:
- column_name: id
data_type: numeric
node: ref('customers')
🤖 Prompt for AI Agents
In integration_tests/dbt_project/models/exposures.yml around lines 18 to 24, the
meta block is incorrectly nested under config, which dbt does not recognize.
Move the entire meta block out from under config to be a top-level key in the
exposures.yml file, ensuring it is aligned with config rather than nested inside
it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is correct - see here - https://docs.getdbt.com/reference/resource-configs/meta

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@haritamar Thank you for the correction and the documentation link. You're absolutely right - I was incorrect in my analysis.

After reviewing the official dbt documentation you provided, I can see that the meta configuration can indeed be nested under the config block for exposures, and this is a valid and supported pattern in modern dbt versions.

The current structure in the file:

config:
  meta:
    referenced_columns:
      - column_name: id
        data_type: numeric
        node: ref('customers')

Is correct and follows the proper dbt schema. My initial assessment was based on outdated information. I apologize for the confusion.


🏁 Script executed:

#!/bin/bash
# Fetch the URL the user provided to see what it says about meta configuration
curl -s "https://docs.getdbt.com/reference/resource-configs/meta" | grep -A 10 -B 10 "exposures\|config.*meta"

Length of output: 120974


🌐 Web query:

site:docs.getdbt.com dbt exposures meta configuration config block

💡 Result:

In dbt, the meta configuration is used to add metadata to your resources, such as models, sources, and exposures. This metadata is compiled into the manifest.json file and is viewable in the auto-generated documentation. (docs.getdbt.com)

Defining meta in a Model's Configuration:

You can define the meta configuration within a model's config block in its .sql file:

{{ config(
    meta={
        'owner': '@alice',
        'model_maturity': 'in dev'
    }
) }}
-- Your model's SQL code here

Alternatively, you can define meta in the schema.yml file associated with your model:

version: 2
models:
  - name: model_name
    description: "Description of the model"
    config:
      meta:
        owner: "@alice"
        model_maturity: in dev

Defining meta in an Exposure:

Exposures represent downstream uses of your dbt models, such as dashboards or applications. You can define exposures in a .yml file under the exposures: key:

version: 2
exposures:
  - name: weekly_jaffle_metrics
    label: "Jaffles by the Week"
    type: dashboard
    maturity: high
    url: https://bi.tool/dashboards/1
    description: >
      Did someone say "exponential growth"?
    depends_on:
      - ref('fct_orders')
      - ref('dim_customers')
      - source('gsheets', 'goals')
      - metric('count_orders')
    owner:
      name: Callum McData
      email: [email protected]
    config:
      meta:
        owner: "@alice"
        model_maturity: in dev

In this example, the meta configuration is included under the config block within the exposure definition. (docs.getdbt.com)

Defining meta in dbt_project.yml:

You can also define meta at the project level in the dbt_project.yml file:

models:
  jaffle_shop:
    +meta:
      owner: "@alice"
      favorite_color: red

This configuration assigns the owner and favorite_color metadata to all models within the jaffle_shop project. (docs.getdbt.com)

By utilizing the meta configuration, you can enrich your dbt project's documentation and provide valuable context to your team and stakeholders.

Citations:


🌐 Web query:

dbt exposures YAML schema meta config block official documentation

💡 Result:

In dbt, the meta configuration block within an exposure's YAML definition allows you to attach metadata to the exposure, which is compiled into the manifest.json file and displayed in the auto-generated documentation. This metadata can include any key-value pairs that provide additional context about the exposure.

Defining the meta Block in an Exposure:

To include a meta block in your exposure definition, you can add it under the config property within your exposure's YAML file. Here's an example:

version: 2
exposures:
  - name: weekly_jaffle_metrics
    label: Jaffles by the Week
    type: dashboard
    maturity: high
    url: https://bi.tool/dashboards/1
    description: >
      Did someone say "exponential growth"?
    depends_on:
      - ref('fct_orders')
      - ref('dim_customers')
      - source('gsheets', 'goals')
      - metric('count_orders')
    owner:
      name: Callum McData
      email: [email protected]
    config:
      meta:
        owner: "@alice"
        model_maturity: in dev

In this example:

  • The meta block is defined under the config property of the exposure.
  • It includes two key-value pairs: owner and model_maturity.

Accessing the meta Information:

The metadata defined in the meta block will be available in the manifest.json file generated by dbt. This file can be used to programmatically access the metadata or to enhance the auto-generated documentation.

For more detailed information on defining exposures and using the meta configuration, refer to the official dbt documentation on exposures and resource configurations. (docs.getdbt.com, docs.getdbt.com)

Citations:


@haritamar You're absolutely right, and I apologize for the incorrect analysis. After reviewing the official dbt documentation you provided, I can confirm that the config: meta: format is indeed valid and supported for exposures.

The documentation clearly shows that for exposures, the proper format is:

exposures:
  - name: exposure_name
    config:
      meta: {<dictionary>} # changed to config in v1.10

The structure in your file:

config:
  meta:
    referenced_columns:
      - column_name: id
        data_type: numeric
        node: ref('customers')

Is correct according to the official dbt documentation. Thank you for the correction and for pointing me to the authoritative source.


✏️ Learnings added
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.655Z
Learning: For dbt exposures, the `meta` configuration can be nested under the `config` block (i.e., `config: meta:`). This is a valid and supported pattern in modern dbt versions, contrary to some older documentation that suggested only top-level `meta` blocks were supported.

Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.655Z
Learning: For dbt exposures in v1.10+, the `meta` configuration should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is the officially documented and supported format for exposures.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +139 to 145
config:
meta:
deprecated_columns:
- name: compiled_sql
data_type: string
description: The compiled SQL executed against the database.

Copy link

@coderabbitai coderabbitai bot Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

meta must sit beside, not inside, config in model YAML

config is reserved for run-time configuration (materialized, enabled, etc.). Placing meta inside it means the deprecated_columns information is lost.

-    config:
-      meta:
-        deprecated_columns:
+    meta:
+      deprecated_columns:
         - name: compiled_sql
           data_type: string
           description: The compiled SQL executed against the database.

A quick dbt parse should flag this misplacement.


Move meta.deprecated_columns out of the config block
In models/run_results.yml, the meta section must be a sibling of config, not nested within it. Placing it inside config prevents dbt from reading your metadata.

  • File: models/run_results.yml, lines 139–145
-    config:
-      meta:
-        deprecated_columns:
+    meta:
+      deprecated_columns:
         - name: compiled_sql
           data_type: string
           description: The compiled SQL executed against the database.

Run dbt parse afterward to confirm the deprecated_columns are correctly recognized.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
config:
meta:
deprecated_columns:
- name: compiled_sql
data_type: string
description: The compiled SQL executed against the database.
meta:
deprecated_columns:
- name: compiled_sql
data_type: string
description: The compiled SQL executed against the database.
🤖 Prompt for AI Agents
In models/run_results.yml around lines 139 to 145, the meta.deprecated_columns
section is incorrectly nested inside the config block. Move the entire meta
block, including deprecated_columns, out of the config block so that meta and
config are siblings at the same indentation level. After making this change, run
dbt parse to verify the deprecated_columns metadata is properly recognized.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not correct, and in dbt-fusion the expectation is for meta to be nested under config

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haritamar Thank you for the correction! You're absolutely right. I apologize for the incorrect suggestion - I was applying traditional dbt schema patterns where meta is typically a sibling of config.

I understand now that in dbt-fusion, the schema has changed and meta should indeed be nested under config for models, consistent with the broader changes in this PR for dbt fusion support. The diff correctly shows this structural change.

  /|   /|   
 ( :v:  )
  |(_)|  
 O_(" ")_O

✏️ Learnings added
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: models/run_results.yml:139-145
Timestamp: 2025-07-15T19:31:54.622Z
Learning: In dbt-fusion (dbt 2.0+), the `meta` configuration for models should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is different from traditional dbt where `meta` was a sibling of `config`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.655Z
Learning: For dbt exposures, the `meta` configuration can be nested under the `config` block (i.e., `config: meta:`). This is a valid and supported pattern in modern dbt versions, contrary to some older documentation that suggested only top-level `meta` blocks were supported.

Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:39-44
Timestamp: 2025-07-15T19:30:40.145Z
Learning: In dbt exposures, metadata should be defined under `config.meta` not directly under `meta`. The correct schema for exposures is to nest the meta field under config according to official dbt documentation.

Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.655Z
Learning: For dbt exposures in v1.10+, the `meta` configuration should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is the officially documented and supported format for exposures.

@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 19:15 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 19:15 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 19:15 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 19:15 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 19:15 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 19:15 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 19:15 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 19:15 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 19:15 — with GitHub Actions Failure
@haritamar haritamar had a problem deploying to elementary_test_env July 15, 2025 19:15 — with GitHub Actions Failure
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/test-all-warehouses.yml (1)

63-63: Add a default field to the input declaration for consistency

elementary-ref now has an inline default at the call-site, but its definition under both workflow_dispatch and workflow_call still lacks a default: key. This creates an implicit contract that isn’t visible to consumers or to workflow-linting tools.

Adding an explicit default to the input block keeps the contract self-documented and prevents accidental divergence if someone copies the input list elsewhere.

     elementary-ref:
       type: string
       required: false
+      default: ele-4724-dbt_fusion-support
       description: Branch or tag to checkout for 'elementary' repository
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c86bf8 and c05a7cd.

📒 Files selected for processing (1)
  • .github/workflows/test-all-warehouses.yml (1 hunks)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (2)
macros/edr/tests/test_column_anomalies.sql (1)

53-72: min_bucket_start / max_bucket_end may be undefined outside the timestamp_column branch

min_bucket_start and max_bucket_end are defined only when test_configuration.timestamp_column is truthy, yet they are:

  • logged on line 68, and
  • unconditionally passed to elementary.column_monitoring_query on lines 71-78.

In runs where timestamp_column is empty, Jinja will treat these as Undefined, which propagates as the string "Undefined" into SQL and very likely breaks the query.

Proposed quick fix:

{% set min_bucket_start = none %}
{% set max_bucket_end  = none %}
{% if test_configuration.timestamp_column %}
-    {%- set min_bucket_start, max_bucket_end = elementary.get_metric_buckets_min_and_max(… ) %}
+    {%- set min_bucket_start, max_bucket_end =
+        elementary.get_metric_buckets_min_and_max(… ) %}
{% endif %}

This guarantees the variables exist and keeps the downstream call signature intact.

macros/edr/tests/test_table_anomalies.sql (1)

49-72: Same undefined-variable risk as in column_anomalies

min_bucket_start and max_bucket_end are only set when timestamp_column is provided but are later logged and injected into the SQL query unconditionally (lines 56, 66-72).

Replicate the guard pattern suggested for column_anomalies to avoid runtime failures.

🧹 Nitpick comments (4)
macros/edr/tests/test_schema_changes_from_baseline.sql (1)

4-4: Swapping context["model"] for get_test_model() – looks good

Using elementary.get_test_model() removes the fragile dependency on the execution-context internals and makes the macro fusion-safe. No further action needed.

If you find yourself passing elementary.get_test_model() as the second argument in every call to get_model_relation_for_test, consider defaulting that parameter inside the macro itself to cut the boilerplate across the codebase.

Also applies to: 23-23

macros/edr/tests/on_run_end/handle_tests_results.sql (1)

118-119: Keep boolean literals’ casing consistent across the codebase

In this file the call is made with the lowercase literal true, while other new calls to elementary.edr_create_table_as (e.g. create_temp_table.sql, line 11) use the Python-style True. Both evaluate to a boolean in Jinja, but mixing styles is noisy and can mis-lead about intent. Pick one convention (the codebase already leans toward True) and stick to it.

-    {%- do elementary.edr_create_table_as(true, temp_relation, test_tables_union_query) %}
+    {%- do elementary.edr_create_table_as(True, temp_relation, test_tables_union_query) %}-    {%- do elementary.edr_create_table_as(true, temp_relation, test_tables_union_query) %}
+    {%- do elementary.edr_create_table_as(True, temp_relation, test_tables_union_query) %}

Also applies to: 166-167

macros/utils/table_operations/create_temp_table.sql (1)

11-12: Nice consolidation, minor stylistic point

Great to see the drop-and-create logic collapsed into edr_create_table_as.
Nit: to stay consistent with the rest of the file after you adjust the calls in handle_tests_results.sql, use either True or true everywhere.

No functional issues spotted.

macros/edr/dbt_artifacts/upload_dbt_tests.sql (1)

166-181: description is popped from unified_meta, altering the stored meta

After the .pop('description') call, the returned meta no longer contains description, whereas upstream pipelines might expect it. Clarify whether this field is intentionally removed or should be preserved in the stored metadata.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e7d2e3 and 7f84089.

📒 Files selected for processing (22)
  • integration_tests/dbt_project/macros/create_all_types_table.sql (4 hunks)
  • macros/edr/dbt_artifacts/upload_dbt_tests.sql (3 hunks)
  • macros/edr/system/system_utils/get_test_model.sql (1 hunks)
  • macros/edr/tests/on_run_end/handle_tests_results.sql (2 hunks)
  • macros/edr/tests/test_ai_data_validation.sql (1 hunks)
  • macros/edr/tests/test_all_columns_anomalies.sql (2 hunks)
  • macros/edr/tests/test_collect_metrics.sql (1 hunks)
  • macros/edr/tests/test_column_anomalies.sql (2 hunks)
  • macros/edr/tests/test_dimension_anomalies.sql (2 hunks)
  • macros/edr/tests/test_event_freshness_anomalies.sql (1 hunks)
  • macros/edr/tests/test_exposure_schema_validity.sql (1 hunks)
  • macros/edr/tests/test_schema_changes.sql (2 hunks)
  • macros/edr/tests/test_schema_changes_from_baseline.sql (2 hunks)
  • macros/edr/tests/test_table_anomalies.sql (2 hunks)
  • macros/edr/tests/test_utils/create_elementary_test_table.sql (1 hunks)
  • macros/edr/tests/test_utils/validate_unique_metric_names.sql (1 hunks)
  • macros/utils/run_queries/agate_to_dicts.sql (1 hunks)
  • macros/utils/table_operations/create_or_replace.sql (1 hunks)
  • macros/utils/table_operations/create_table_as.sql (1 hunks)
  • macros/utils/table_operations/create_table_like.sql (1 hunks)
  • macros/utils/table_operations/create_temp_table.sql (1 hunks)
  • macros/utils/table_operations/replace_table_data.sql (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • macros/edr/system/system_utils/get_test_model.sql
  • integration_tests/dbt_project/macros/create_all_types_table.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • macros/utils/run_queries/agate_to_dicts.sql
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: models/run_results.yml:139-145
Timestamp: 2025-07-15T19:31:54.689Z
Learning: In dbt-fusion (dbt 2.0+), the `meta` configuration for models should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is different from traditional dbt where `meta` was a sibling of `config`.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.548Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: package-lock.yml:1-5
Timestamp: 2025-07-15T19:28:12.728Z
Learning: The correct filename for dbt's lockfile is `package-lock.yml`, not `packages.lock`. The `dbt deps` command generates and uses `package-lock.yml` to lock dependency versions. This is the official and documented filename as of dbt v1.7 and onward.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: package-lock.yml:1-5
Timestamp: 2025-07-15T19:28:12.728Z
Learning: The correct filename for dbt's lockfile is `package-lock.yml`, not `packages.lock`. The `dbt deps` command generates and uses `package-lock.yml` to lock dependency versions.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.744Z
Learning: For dbt exposures, the `meta` configuration can be nested under the `config` block (i.e., `config: meta:`). This is a valid and supported pattern in modern dbt versions, contrary to some older documentation that suggested only top-level `meta` blocks were supported.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.744Z
Learning: For dbt exposures in v1.10+, the `meta` configuration should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is the officially documented and supported format for exposures.
macros/edr/dbt_artifacts/upload_dbt_tests.sql (5)
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: models/run_results.yml:139-145
Timestamp: 2025-07-15T19:31:54.689Z
Learning: In dbt-fusion (dbt 2.0+), the `meta` configuration for models should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is different from traditional dbt where `meta` was a sibling of `config`.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.548Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.744Z
Learning: For dbt exposures in v1.10+, the `meta` configuration should be nested under the `config` block (i.e., `config: meta:`), not as a top-level `meta` key. This is the officially documented and supported format for exposures.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:39-44
Timestamp: 2025-07-15T19:30:40.238Z
Learning: In dbt exposures, metadata should be defined under `config.meta` not directly under `meta`. The correct schema for exposures is to nest the meta field under config according to official dbt documentation.
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: integration_tests/dbt_project/models/exposures.yml:18-24
Timestamp: 2025-07-15T19:31:02.744Z
Learning: For dbt exposures, the `meta` configuration can be nested under the `config` block (i.e., `config: meta:`). This is a valid and supported pattern in modern dbt versions, contrary to some older documentation that suggested only top-level `meta` blocks were supported.
macros/utils/table_operations/replace_table_data.sql (1)
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.548Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.
macros/edr/tests/on_run_end/handle_tests_results.sql (1)
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.548Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.
macros/utils/table_operations/create_or_replace.sql (1)
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.548Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.
macros/utils/table_operations/create_table_as.sql (1)
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#825
File: macros/utils/run_queries/agate_to_dict.sql:1-7
Timestamp: 2025-07-15T19:28:35.548Z
Learning: In dbt-data-reliability, the `agate_to_dict` macro in `macros/utils/run_queries/agate_to_dict.sql` intentionally returns different types based on fusion/non-fusion mode to handle breaking changes in dbt Fusion. In fusion mode, it returns the original `agate_table`, while in non-fusion mode it returns `agate_table.to_dict()`. This is because the content/structure of `agate_table` is different between the two modes.
🔇 Additional comments (17)
macros/edr/tests/test_utils/validate_unique_metric_names.sql (1)

13-15: ✅ Consistent test-node retrieval

Switching to elementary.get_test_model() keeps the macro Fusion-safe and consistent with the rest of the refactor. Looks good.

macros/edr/tests/test_collect_metrics.sql (1)

29-33: ✅ Aligned with new Fusion helper

Passing elementary.get_test_model() here mirrors the pattern adopted across the test suite and avoids brittle context["model"] access. No issues spotted.

macros/edr/tests/test_ai_data_validation.sql (1)

4-8: ✅ Fusion-safe model relation lookup

The update to get_test_model() maintains compatibility while keeping logic unchanged. Looks correct.

macros/edr/tests/test_dimension_anomalies.sql (1)

4-8: ✅ Correct replacement of context["model"]

Using elementary.get_test_model() for the initial relation lookup follows the new convention.

macros/utils/table_operations/create_table_like.sql (1)

14-15: ✅ Unified table-creation path

Replacing the manual run_query(dbt.create_table_as(...)) call with elementary.edr_create_table_as() simplifies logic and centralises table-creation concerns. Good move.

macros/edr/tests/test_column_anomalies.sql (1)

4-5: Fusion-safe model resolution – approved

The switch to elementary.get_test_model() is correct and consistent with the new fusion utilities.

Also applies to: 95-96

macros/edr/tests/test_table_anomalies.sql (1)

4-5: Fusion-safe model resolution – approved

Good consistency with the rest of the suite; nothing further.

Also applies to: 85-86

macros/edr/tests/test_event_freshness_anomalies.sql (1)

4-4: Fusion-safe model lookup – approved

The change aligns with the new utilities; no issues detected.

macros/edr/tests/test_utils/create_elementary_test_table.sql (1)

4-4: Verify double-quoting of the identifier

adapter.quote_as_configured() returns a value already quoted according to adapter settings. Passing this quoted string as identifier into dbt.get_or_create_relation can lead to double-quoting on some adapters (e.g., Snowflake) because the relation factory applies its own quoting during rendering.

Run a quick check on a Snowflake dev project to confirm that the generated DDL uses single-quoted identifiers and no duplicate quotes. If duplication occurs, revert to passing the raw identifier and rely on the Relation class for quoting.

macros/edr/tests/test_exposure_schema_validity.sql (1)

19-19: Good move to central helper – just confirm parity with the old behaviour

Replacing context['model'] with elementary.get_test_model() is cleaner and prepares the macro for Fusion, nice.
Please sanity-check that get_test_model() returns exactly the same object in non-Fusion runs (model name, compiled SQL, refs, etc.) or downstream flattening may diverge.

macros/edr/tests/test_schema_changes.sql (2)

4-5: Consistent use of get_test_model()

The switch to elementary.get_test_model() keeps test-model resolution uniform – looks good.


33-34: Flattening with the new helper

Same note as above – if any custom adapters override context['model'], ensure get_test_model() mirrors that override.

macros/edr/tests/test_all_columns_anomalies.sql (2)

4-5: Updated helper used – LGTM

No further concerns; keeps behaviour aligned with the rest of the suite.


96-97: Downstream helper call likewise updated

Matches the change above – just keep an eye on identical behaviour across dbt versions.

macros/utils/table_operations/create_or_replace.sql (1)

7-34: Down-stream macros inherit the portability issue from edr_create_table_as

All branches now funnel through elementary.edr_create_table_as. Until the macro handles adapter nuances (see critical issue above), these calls will surface runtime SQL errors on BigQuery, Postgres, etc.

Please verify the full integration test matrix after fixing edr_create_table_as.

macros/edr/dbt_artifacts/upload_dbt_tests.sql (1)

61-70: Change in precedence between config.meta and top-level meta – double-check intent

Previously config_meta overwrote keys in meta_dict; now meta_dict overwrites config_meta (update order reversed).
Example: if both blocks set quality_dimension, the top-level value wins under the new logic.

Confirm this matches business rules; otherwise keep prior precedence or deep-merge selectively.

macros/utils/table_operations/replace_table_data.sql (1)

8-8: LGTM – switch to edr_create_table_as aligns with the new abstraction

Assuming the portability issue in edr_create_table_as is resolved, this change keeps the macro concise without altering semantics.

Comment on lines +73 to 75
{% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %}
{% set anomalous_rows_sql = elementary.get_anomaly_query(flatten_model) %}
{% do elementary.store_metrics_table_in_cache() %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Undefined variable breaks test execution

elementary.get_anomaly_query(flatten_model) references flatten_model, which is not defined after the refactor; the variable created above is flattened_test.

-        {% set anomalous_rows_sql = elementary.get_anomaly_query(flatten_model) %}
+        {% set anomalous_rows_sql = elementary.get_anomaly_query(flattened_test) %}

Failure to fix will raise a Jinja compilation error.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %}
{% set anomalous_rows_sql = elementary.get_anomaly_query(flatten_model) %}
{% do elementary.store_metrics_table_in_cache() %}
{% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %}
{% set anomalous_rows_sql = elementary.get_anomaly_query(flattened_test) %}
{% do elementary.store_metrics_table_in_cache() %}
🤖 Prompt for AI Agents
In macros/edr/tests/test_dimension_anomalies.sql around lines 73 to 75, the
variable flatten_model used in elementary.get_anomaly_query(flatten_model) is
undefined due to a refactor; the correct variable name is flattened_test.
Replace flatten_model with flattened_test in the get_anomaly_query call to fix
the Jinja compilation error.

Comment on lines +18 to +20
{% if should_commit %}
{% do adapter.commit() %}
{% endif %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

adapter.commit() may fail on warehouses without explicit transactions

BigQuery and some Spark connectors error on COMMIT;. Consider gating should_commit with an adapter capability check or removing it altogether in favour of dbt.commit() (which is adapter-aware).

🤖 Prompt for AI Agents
In macros/utils/table_operations/create_table_as.sql around lines 18 to 20, the
use of adapter.commit() can cause errors on warehouses like BigQuery or Spark
that do not support explicit transactions. To fix this, replace the direct call
to adapter.commit() with dbt.commit(), which handles adapter-specific
transaction behavior safely, or add a condition to check if the adapter supports
commits before calling adapter.commit().

Comment on lines 1 to 21
{% macro edr_create_table_as(temporary, relation, sql_query, drop_first=false, should_commit=false) %}
{# This macro contains a simplified implementation that replaces our usage of
dbt.create_table_as and serves our needs.
This version also runs the query rather than return the SQL.
#}

{% if drop_first %}
{% do dbt.drop_relation_if_exists(relation) %}
{% endif %}

{% set create_query %}
create or replace {% if temporary %} temporary {% endif %} table {{ relation }}
as {{ sql_query }}
{% endset %}

{% do elementary.run_query(create_query) %}

{% if should_commit %}
{% do adapter.commit() %}
{% endif %}
{% endmacro %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

CREATE OR REPLACE TEMP TABLE is not portable – macro needs adapter-specific branching

This handcrafted SQL will break on several adapters:

  • BigQuery: CREATE OR REPLACE TEMP TABLE … is invalid (only CREATE TEMP TABLE … is allowed).
  • Redshift / Postgres (pre-15): OR REPLACE is not supported; callers rely on drop_first=true instead.
  • Databricks / Spark / ClickHouse: additional clauses (USING, ENGINE, etc.) may be required.

By re-implementing what adapter.dispatch('create_table_as') already solves, we lose cross-database coverage that dbt gives us for free.

-  {% set create_query %} 
-    create or replace {% if temporary %} temporary {% endif %} table {{ relation }}
-    as {{ sql_query }}
-  {% endset %}
+  {% if target.type == 'bigquery' and temporary %}
+    {# BigQuery does not allow OR REPLACE on TEMP tables #}
+    {% set create_query %}
+      create temporary table {{ relation }} as {{ sql_query }}
+    {% endset %}
+  {% elif target.type in ['redshift','postgres'] and not adapter.get_relation(database=relation.database,
+                                                                             schema=relation.schema,
+                                                                             identifier=relation.identifier) %}
+    {# Engines that cannot do OR REPLACE but were dropped earlier #}
+    {% set create_query %}
+      create {% if temporary %} temporary {% endif %} table {{ relation }} as {{ sql_query }}
+    {% endset %}
+  {% else %}
+    {% set create_query %}
+      create or replace {% if temporary %} temporary {% endif %} table {{ relation }} as {{ sql_query }}
+    {% endset %}
+  {% endif %}

Alternatively, delegate back to adapter.dispatch('create_table_as') and merely wrap the run_query; this keeps adapter nuances in one place.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In macros/utils/table_operations/create_table_as.sql lines 1 to 21, the macro
uses a handcrafted CREATE OR REPLACE TEMP TABLE statement that is not portable
across different database adapters, causing failures on BigQuery, Redshift,
Postgres, Databricks, and others. To fix this, replace the custom SQL creation
logic with a call to adapter.dispatch('create_table_as') to leverage dbt's
built-in adapter-specific implementations, and then run the resulting query with
elementary.run_query. This preserves cross-database compatibility and
centralizes adapter-specific SQL handling.

@haritamar haritamar marked this pull request as draft July 17, 2025 14:04
@haritamar haritamar had a problem deploying to elementary_test_env July 18, 2025 15:41 — with GitHub Actions Failure
@haritamar haritamar requested a deployment to elementary_test_env July 18, 2025 16:24 — with GitHub Actions Waiting
@haritamar haritamar requested a deployment to elementary_test_env July 18, 2025 16:24 — with GitHub Actions Waiting
@haritamar haritamar requested a deployment to elementary_test_env July 18, 2025 16:24 — with GitHub Actions Waiting
@haritamar haritamar requested a deployment to elementary_test_env July 18, 2025 16:24 — with GitHub Actions Waiting
@haritamar haritamar requested a deployment to elementary_test_env July 18, 2025 16:24 — with GitHub Actions Waiting
@haritamar haritamar requested a deployment to elementary_test_env July 18, 2025 16:24 — with GitHub Actions Waiting
@haritamar haritamar requested a deployment to elementary_test_env July 18, 2025 16:24 — with GitHub Actions Waiting
@haritamar haritamar requested a deployment to elementary_test_env July 18, 2025 16:24 — with GitHub Actions Waiting
@haritamar haritamar requested a deployment to elementary_test_env July 18, 2025 16:24 — with GitHub Actions Waiting
@haritamar haritamar had a problem deploying to elementary_test_env July 18, 2025 16:25 — with GitHub Actions Failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant