-
Notifications
You must be signed in to change notification settings - Fork 15
Add support for linting and scoring dbt seeds #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for linting and scoring dbt seeds #110
Conversation
|
Thanks for opening a PR 🙌 , I will have a look soon. There's some linting errors by the way: https://github.com/PicnicSupermarket/dbt-score/actions/runs/14473008011/job/40645749304?pr=110 |
jochemvandooren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! 🙌 Overall, the feature looks very good, I will have a closer look at the tests tomorrow. Left some small comments already
| if invalid_column_names: | ||
| max_length = 60 | ||
| message = f"Columns lack a description: {', '.join(invalid_column_names)}." | ||
| if len(message) > max_length: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this redundant, as you can also do f"{message[:60]}…" if the length is lower than 60? It will just show the full string I think
|
Also, there's still some linting errors related to |
|
Thanks a lot for the feedback @jochemvandooren! I've addressed all your comments - updated the PR number in the CHANGELOG, removed the seed.md documentation file for consistency, simplified the max_length check in the column description rule, changed the severity of |
jochemvandooren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just left some final comments on the tests, great work and thanks for improving the tests! 🙌
tests/test_cli.py
Outdated
| def test_lint_existing_manifest(manifest_path): | ||
| """Test lint with an existing manifest.""" | ||
| with patch("dbt_score.cli.Config._load_toml_file"): | ||
| with patch("dbt_score.cli.lint_dbt_project") as mock_lint: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we do this here? Now we are patching lint_dbt_project, meaning we will not actually lint the manifest, which was the goal of the test.
| mock_eval.project_score = Score(5.0, "🥉") # Score below 10.0 | ||
| mock_eval.scores.values.return_value = [] | ||
|
|
||
| with patch("dbt_score.cli.lint_dbt_project") as mock_lint: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here it makes more sense! Now we actually test only the fail_project_under behavior 👍
tests/test_cli.py
Outdated
|
|
||
| with patch("dbt_score.cli.lint_dbt_project") as mock_lint: | ||
| mock_lint.return_value = mock_eval | ||
| # Also patch the HumanReadableFormatter to control the output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray comment?
tests/test_cli.py
Outdated
| assert result.exit_code == 1 | ||
|
|
||
|
|
||
| def test_fail_any_model_under(manifest_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def test_fail_any_model_under(manifest_path): | |
| def test_fail_any_item_under(manifest_path): |
Consistency 🤓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great 👌 We should do the same for other dbt entities! Will look into it in another PR
jochemvandooren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks a lot! 🙌 I'll leave the last discussion up to you and @matthieucan
|
Ah I have merged another PR, I am afraid you have to resolve some conflicts. Please let me know if you need any help there! |
jochemvandooren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see the rebasing worked out 👌 I suggest try keeping the changes related to the seed feature only! Left a couple of comments about that
src/dbt_score/models.py
Outdated
| def get_first_model(self) -> Model | None: | ||
| """Get the first model in the collection, if any.""" | ||
| return next(iter(self.models.values())) if self.models else None | ||
|
|
||
| def get_first_source(self) -> Source | None: | ||
| """Get the first source in the collection, if any.""" | ||
| return next(iter(self.sources.values())) if self.sources else None | ||
|
|
||
| def get_first_snapshot(self) -> Snapshot | None: | ||
| """Get the first snapshot in the collection, if any.""" | ||
| return next(iter(self.snapshots.values())) if self.snapshots else None | ||
|
|
||
| def get_first_seed(self) -> Seed | None: | ||
| """Get the first seed in the collection, if any.""" | ||
| return next(iter(self.seeds.values())) if self.seeds else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these methods serve any purpose, other than being used in the tests. So I suggest not creating these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the case for all these helper functions
src/dbt_score/models.py
Outdated
| elif parent_id in self.seeds: | ||
| node.parents.append(self.seeds[parent_id]) | ||
|
|
||
| def _populate_parents(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this method changed? I try to keep the changes related to seeds and not change to much related to other functionalities to keep things small and related to a single feature! Also this code was reviewed, approved and merged so I see no reason to change it unless you have very good reasons to, does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right, I added them for convenience for tests, but I have now removed them
Co-authored-by: Jochem van Dooren <[email protected]>
…ov/dbt-score into feature/seed-support
jochemvandooren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some final comment! 🙌
|
|
||
| evaluation.evaluate() | ||
|
|
||
| model2 = manifest_loader.models["model.package.model2"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't see what's wrong with fetching the model from the manifest by it's ID? I think this is a neater way of doing it than having to search it by name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the other ocurrences, i think fetching the model by it's key should be the best way to do it!
tests/test_models.py
Outdated
|
|
||
|
|
||
| @patch("dbt_score.models.Path.read_text") | ||
| def test_parent_references(mock_read_text, raw_manifest): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a test is already in place for this: https://github.com/ross-whatnot/dbt-score/blob/edf0563d0f93aecd41a639ccdb628eff5ae8aded/tests/test_models.py#L38-L49
|
Awesome @StrahilPeykov, thanks a lot for your great contribution! 🙌 |
Overview
Add support for linting and scoring seed resources in dbt-score, following issue #105.
Problem
Previously, dbt-score only supported linting models, sources, and snapshots. Seeds were not evaluated, creating an inconsistency in the quality assessment of a dbt project's metadata. Since seeds often contain important reference data, ensuring they have proper documentation and ownership is valuable.
Implementation
Seedclass to represent dbt seedsManifestLoaderto load seeds from the manifestEvaluationclass to include seeds in evaluation chainNew Rules
seed_has_description- Ensures seeds have descriptive documentationseed_columns_have_description- Verifies seed columns are documentedseed_has_tests- Checks that seeds have appropriate testsseed_has_owner- Ensures seeds have defined ownershipTesting
Full test coverage has been added for seed support.