Skip to content

Conversation

@StrahilPeykov
Copy link
Contributor

@StrahilPeykov StrahilPeykov commented Apr 15, 2025

Overview

Add support for linting and scoring seed resources in dbt-score, following issue #105.

Problem

Previously, dbt-score only supported linting models, sources, and snapshots. Seeds were not evaluated, creating an inconsistency in the quality assessment of a dbt project's metadata. Since seeds often contain important reference data, ensuring they have proper documentation and ownership is valuable.

Implementation

  • Added Seed class to represent dbt seeds
  • Updated ManifestLoader to load seeds from the manifest
  • Added seed-specific linting rules (description, columns, tests, ownership)
  • Updated Evaluation class to include seeds in evaluation chain
  • Modified formatters to handle and display seed results
  • Added comprehensive tests for seed support

New Rules

  • seed_has_description - Ensures seeds have descriptive documentation
  • seed_columns_have_description - Verifies seed columns are documented
  • seed_has_tests - Checks that seeds have appropriate tests
  • seed_has_owner - Ensures seeds have defined ownership

Testing

Full test coverage has been added for seed support.

  • Fixtures for seeds in test suite
  • Tests for seed-specific rules
  • Updates to existing tests to accommodate seeds

@jochemvandooren
Copy link
Contributor

jochemvandooren commented Apr 17, 2025

Thanks for opening a PR 🙌 , I will have a look soon. There's some linting errors by the way: https://github.com/PicnicSupermarket/dbt-score/actions/runs/14473008011/job/40645749304?pr=110

Copy link
Contributor

@jochemvandooren jochemvandooren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! 🙌 Overall, the feature looks very good, I will have a closer look at the tests tomorrow. Left some small comments already

if invalid_column_names:
max_length = 60
message = f"Columns lack a description: {', '.join(invalid_column_names)}."
if len(message) > max_length:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this redundant, as you can also do f"{message[:60]}…" if the length is lower than 60? It will just show the full string I think

@jochemvandooren
Copy link
Contributor

Also, there's still some linting errors related to mypy! You can run pre-commit run --all-files locally to get those errors

@StrahilPeykov
Copy link
Contributor Author

Thanks a lot for the feedback @jochemvandooren! I've addressed all your comments - updated the PR number in the CHANGELOG, removed the seed.md documentation file for consistency, simplified the max_length check in the column description rule, changed the severity of seed_has_tests to LOW, and fixed all the mypy and linting issues with pre-commit.

Copy link
Contributor

@jochemvandooren jochemvandooren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left some final comments on the tests, great work and thanks for improving the tests! 🙌

def test_lint_existing_manifest(manifest_path):
"""Test lint with an existing manifest."""
with patch("dbt_score.cli.Config._load_toml_file"):
with patch("dbt_score.cli.lint_dbt_project") as mock_lint:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this here? Now we are patching lint_dbt_project, meaning we will not actually lint the manifest, which was the goal of the test.

mock_eval.project_score = Score(5.0, "🥉") # Score below 10.0
mock_eval.scores.values.return_value = []

with patch("dbt_score.cli.lint_dbt_project") as mock_lint:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here it makes more sense! Now we actually test only the fail_project_under behavior 👍


with patch("dbt_score.cli.lint_dbt_project") as mock_lint:
mock_lint.return_value = mock_eval
# Also patch the HumanReadableFormatter to control the output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray comment?

assert result.exit_code == 1


def test_fail_any_model_under(manifest_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_fail_any_model_under(manifest_path):
def test_fail_any_item_under(manifest_path):

Consistency 🤓

Copy link
Contributor

@jochemvandooren jochemvandooren Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great 👌 We should do the same for other dbt entities! Will look into it in another PR

Copy link
Contributor

@jochemvandooren jochemvandooren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks a lot! 🙌 I'll leave the last discussion up to you and @matthieucan

@jochemvandooren
Copy link
Contributor

Ah I have merged another PR, I am afraid you have to resolve some conflicts. Please let me know if you need any help there!

Copy link
Contributor

@jochemvandooren jochemvandooren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see the rebasing worked out 👌 I suggest try keeping the changes related to the seed feature only! Left a couple of comments about that

Comment on lines 753 to 767
def get_first_model(self) -> Model | None:
"""Get the first model in the collection, if any."""
return next(iter(self.models.values())) if self.models else None

def get_first_source(self) -> Source | None:
"""Get the first source in the collection, if any."""
return next(iter(self.sources.values())) if self.sources else None

def get_first_snapshot(self) -> Snapshot | None:
"""Get the first snapshot in the collection, if any."""
return next(iter(self.snapshots.values())) if self.snapshots else None

def get_first_seed(self) -> Seed | None:
"""Get the first seed in the collection, if any."""
return next(iter(self.seeds.values())) if self.seeds else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these methods serve any purpose, other than being used in the tests. So I suggest not creating these

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the case for all these helper functions

elif parent_id in self.seeds:
node.parents.append(self.seeds[parent_id])

def _populate_parents(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this method changed? I try to keep the changes related to seeds and not change to much related to other functionalities to keep things small and related to a single feature! Also this code was reviewed, approved and merged so I see no reason to change it unless you have very good reasons to, does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, I added them for convenience for tests, but I have now removed them

Copy link
Contributor

@jochemvandooren jochemvandooren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final comment! 🙌


evaluation.evaluate()

model2 = manifest_loader.models["model.package.model2"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see what's wrong with fetching the model from the manifest by it's ID? I think this is a neater way of doing it than having to search it by name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the other ocurrences, i think fetching the model by it's key should be the best way to do it!



@patch("dbt_score.models.Path.read_text")
def test_parent_references(mock_read_text, raw_manifest):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jochemvandooren jochemvandooren merged commit cb57712 into PicnicSupermarket:master May 6, 2025
4 checks passed
@jochemvandooren
Copy link
Contributor

Awesome @StrahilPeykov, thanks a lot for your great contribution! 🙌

@jochemvandooren jochemvandooren linked an issue May 6, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Add support for seeds

3 participants