Replies: 12 comments 8 replies
-
Thanks for the write-up Joel! From the perspective of a team lead of a rather small-ish dbt deployment (2 people team, ca. 200 models). General thoughts
Cross-databaseIncluding the respective implementations in the database adapters seems right to me. We never referenced them in our code anyways and as you start using multiple packages, the dependencies can become a bit messy. Huge thanks to the effort of all package maintainers in getting new releases out as soon as possible! Macros and testsWe "vote" to keep this together. From the perspective of discovering new dbt functionality, it's easier to install a single package and then discover more functionality within it. Let's assume we started using dbt_utils because of the pivot macro and then discovered the new testing possibilities. Not sure if we would have searched the whole hub for more packages. ExperimentalNo strong opinions here. However, having an experimental feature in such an integral package seems a bit odd, as noted by you. |
Beta Was this translation helpful? Give feedback.
-
Thanks for having the initiative @joellabes ! We use the helpful macro and testing side of Some tests we use:
Some macros we use:
When it comes to experimental features like |
Beta Was this translation helpful? Give feedback.
-
And let's not forget about the Did you have anything in mind for it? Regarding the 4 important adapters, I think it makes sense to move cross-db macros to their database adapter repositories, keeping the default implementation in the core. |
Beta Was this translation helpful? Give feedback.
-
Thanks for this thoughtful writeup @joellabes! I wanted to jump in on the fun and add my thoughts to this thread. 😄
dbt-utils has become a behemoth in itself and I definitely see the value in splitting the various functions of the package into distinct use cases. With that, I completely agree with this initiative and have a few focused thoughts I wanted to share.
These are my initial thoughts, but will be sure to post any updates in this post if any more come to me.
In my case it would not allow us to drop the dbt-utils dependency. I did a quick search this morning and found that we use the If these macros are able to be easily dispatched from dbt-core/adapters then this would be no problem! However, I struggle to see this being an easy lift as we explicitly call out
I stand behind this initiative and like the idea creating more distinct use case packages. The dbt-utils package as it stands is quite large and I can imagine a first (or multiple) time user can be a bit overwhelmed. If the users are now working within a more focused version of the package then I can see the user experience improving and maybe even feel encouraged to contribute to the package?! That is a stretch maybe haha but I like the thought! A more general (but still related) question I have is around package dependency issues. The great thing about dbt-utils right now is there is only one package version you have to worry about. However, if we then hypothetically break this into two, three or more packages down the line then I could see a new issue arise around dependency and version issues across these various utility packages. What are your thoughts around improving this experience for users? I have seen a lot of questions with our package users who experience dbt run failures due to package dependencies being off. It would be great if there was some matrix or tool that could be leveraged by users to help navigate the package ecosystem and ensure they are using the appropriate versions across the board. This may be better for a more focused Issue/FR, but something to think about if the plan is to separate this package into multiple packages. |
Beta Was this translation helpful? Give feedback.
-
I see so far there is a preference to keep macros and tests together, but I think the fact that we have tests in multiple places can be confusing to new users. Today we have tests in core, utils, and dbt-expectations. I realize that they are not the same maintainer, but from a new user's perspective, they are looking for "testing" and do not realize this difference exists. One suggestion would be to merge tests in utils to either core or dbt-expecations. Aligned on the rest. |
Beta Was this translation helpful? Give feedback.
-
Imho, we should try to nail down what the dbt-utils package wants to be (this seems like the attempt at doing so 😀) - and perhaps a dbt utility package should exist to help analytics engineers be "lazier" with "chore" type work - it should help them write less code when they want to union all relations, when they want to pivot tables, when they want to join N dim tables into a single one, etc. TestsIf the tests are important enough, then we should promote them into dbt-core rather than maintaining two different sets of tests between the packages that dbt Labs maintains. Would this make dbt-core itself harder to maintain though? 🤔 Cross-database stuffI think these should be implemented in the adapter's themselves and then dispatched just like how each adapter currently have their own implementations of say the core ExperimentsProbably uncontroversial to say but they probably have no business being here - just get rid of them 😬 Once we remove the above categories, then likely we are left with our guiding principle of "helping analytics engineers do chores with less code". |
Beta Was this translation helpful? Give feedback.
-
I've really appreciated reading each and every response so far! Thank you all for engaging so quickly and thoughtfully :) As @joellabes rightly pointed out to me, there's a recurrent motif:
Why core?
Why not core?
So, if I had my druthers...What I'd want to move into
Macros we wouldn't move to core/plugins:
|
Beta Was this translation helpful? Give feedback.
-
just came here to say that you are doing the lord's work joel and jeremy — keep going down this path, tysm |
Beta Was this translation helpful? Give feedback.
-
I've gathered from this discussion:
Anyway, breaking up dbt utils sounds reasonable to me, a relatively new user of dbt. Thanks all! |
Beta Was this translation helpful? Give feedback.
-
A couple of other standardisation projects we should keep in mind as we rocket towards 1.0:
|
Beta Was this translation helpful? Give feedback.
-
Team, I just need your opinion on enhancement of dbt_utils.equality which will be inline with following code
Let me know your views. |
Beta Was this translation helpful? Give feedback.
-
This may have been previously mentioned, but adding more description on package dependency version requirements in the error logs would be super helpful. Recently ran into this issue with dbt-utils:
This results from adding the dbt-audit-helper package. More info in the log would be useful to trace back the dependency. |
Beta Was this translation helpful? Give feedback.
-
TL;DR
I think that the dbt utils package is trying to do too much, and that it should be more focused. Below is a proposal to break it into more logical chunks.
Once you've read it, I'd love to know your answers to any and all of these questions:
Background
By a wide margin, dbt utils is the most-used package in the dbt Hub - over 1/3 of active projects installed it over the last week. However, it is actually many packages in one:
relationships_where
,unique_combination_of_columns
),accepted_range
,not_constant
),equal_rowcount
,cardinality_equality
)star
,pivot
,get_column_values
insert_by_period
, a custom materialization which fills a niche as best it can but relies on hacky behaviour to work.These different packages are... different, both in target users and rate of iteration. If we pull them out into their own packages, they can each do their own thing without interfering with one another.
What would this look like?
Cross-database stuff
This is slow-moving, stable code, and is depended on by most packages which aim to support all database adapters. We'd move the
default
implementations of these into dbt Core, and the overrides into the various database adapter repositories. For example,bigquery__except
would move to https://github.com/dbt-labs/dbt-bigquery.By doing this, we hope that many package vendors could drop their dependencies on utils altogether, which makes version resolution easier.
Macros and tests: one package or two?
Although macros and tests are different on paper, they both stay true to the spirit of dbt utils: "install this, and you'll get a bunch of useful functionality for free". For that reason, I personally lean towards keeping them together. There's an argument to be made that they could be split out to
dbt_utils
anddbt_extra_tests
or something - if you want to be the one to make that argument, I'll hear it!Experiments
This is not clearly cut. Some options, from least favourite to most favourite:
I believe that everything inside of dbt-utils should be usable by the "core four" adapters (Redshift, Postgres, Snowflake and BigQuery). There is a wonderful community contribution pending to add Snowflake compatibility to this materialization, but nothing for BQ or Postgres.
For this reason, my preference is to move it to the experimental features repo, where it's available to the community but the compatibility and maintenance expectations are clearer. Even if it was compatible with all four adapters, it's still a hacky setup – albeit useful! – and not at the same level of resilience as the rest of the project.
Migrating safely
I imagine that we'd implement a similar deprecation process to the not_null_where and unique_where macros for a time: dbt utils would display a deprecation warning and then dispatch to the macro's new home, before being removed in a future minor version (or maybe dbt utils 1.0.0?!)
Looking forward to hearing everyone's thoughts!
Beta Was this translation helpful? Give feedback.
All reactions