Minimal example showcasing the issue when using ref with a duplicate model name in an
imported project.
The ref macro points to the current dbt project in an upstream dbt project's model
when referencing a model with a name that also exists in the current dbt project.
In our organization we have multiple dbt projects. The dbt projects of our central data warehouse team is oftentimes imported as a package in downstream dbt projects. These downstream dbt projects build their dbt models on top of their dbt models.
Whenever the upstream dbt project uses ref, it uses the single-argument version as it
doesn't import any other dbt projects. In the downstream dbt projects, we do use the
two-argument version of ref to explicitly point to a dbt model from the upstream dbt
project. Still, the downstream dbt project runs into issues which is showcased by the
minimal example described below.
First spin up a test Postgres database:
docker compose up -dNext, create your Poetry environment:
poetry install
poetry shellInstall the dbt dependencies required by project_downstream.
cd project_downstream
dbt depsRun the dbt models of project_downstream.
dbt compileThis will yield the following error:
RuntimeError: Found a cycle: model.project_downstream.model_x --> model.project_upstream.model_ydbt incorrectly thinks that project_upstream.model_y is dependent on
project_downstream.model_x instead of project_upstream.model_x, resulting in a
cyclic dependency since project_downstream.model_x references
project_upstream.model_y.
I would have expected that dbt, when using ref with a single argument in
project_upstream.model_y, references the dbt model of the same project, i.e.
project_upstream.model_x, regardless of whether we are running dbt from
project_upstream or project_downstream. Instead, it seems to default to the project
from which you are running dbt, even though project_downstream is not listed in
packages.yml as a dependency of project_upstream.
You could always use the package name argument in project_upstream, even though it is
not importing any project (i.e. always use ref("project_upstream", <model_name>)).
This would prevent the cyclic error in downstream dbt projects that contain a dbt model
with the same name. However, this workaround is not always desirable or even possible
in case the dbt project you are importing is not managed by your team/organization.
You could override the built-in ref macro to always default to the current model's
package name if it is not explicitly supplied to ref:
{% macro ref() %}
-- Extract user-provided positional and keyword arguments.
{% set version = kwargs.get("version") or kwargs.get("v") %}
{% set packagename = none %}
{% if (varargs | length) == 1 -%}
{% set modelname = varargs[0] %}
{% else -%}
{% set packagename = varargs[0] %}
{% set modelname = varargs[1] %}
{% endif %}
{% if packagename is not none %}
{% do return(builtins.ref(packagename, modelname, version=version)) %}
-- If package name is not specified, assume the package from which we are calling `ref`.
{% else %}
{% do return(builtins.ref(model.package_name, modelname, version=version)) %}
{% endif %}
{% endmacro %}
Adding this macro in project_downstream/macros/ref.sql and running dbt compile again
does get rid of the cyclic dependency error. However, it does introduce a new issue:
Runtime Error
Compilation Error in test unique_model_x_a (models/model_x/model_x.yml)
dbt was unable to infer all dependencies for the model "unique_model_x_a".
This typically happens when ref() is placed within a conditional block.
To fix this, add the following hint to the top of the model "unique_model_x_a":
-- depends_on: {{ ref('project_upstream', 'model_x') }}
> in macro ref (macros/ref.sql)
> called by test unique_model_x_a (models/model_x/model_x.yml)
This error is also thrown when using a different data test, e.g. accepted_values. The
suggested fix to add this -- depends_on comment is not desirable as these would need
to be added to the dbt models of the upstream project which can be owned by a different
team/organization. Another workaround I have found so far is removing the data test from
project_upstream.model_x. For obvious reasons, this is also not a desirable
workaround.