Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: lint sources #78

Merged
merged 41 commits into from
Nov 12, 2024
Merged

Conversation

otosky
Copy link
Contributor

@otosky otosky commented Oct 10, 2024

Overview

Extends rules so that they can be used to against dbt sources in addition to models.

Usage

A rule defines what resource-type it acts against in the type signature of the function it wraps or in a class-based evaluate method:

from dbt_score import Model, Source rule, Rule, RuleViolation

# decorator-based
# for a Model
@rule
def model_has_description(model: Model) -> RuleViolation | None:
    """A model should have a description."""
    if not model.description:
        return RuleViolation(message="Model lacks a description.")

# for a Source
@rule
def has_description(source: Source) -> RuleViolation | None:
    """A source should have a loader defined."""
    if not source.loader:
        return RuleViolation(message="Source lacks a loader.")

# class-based
class ExampleSource(Rule):
    """Example class-based rule."""

    description = "A source should have a loader defined."

    def evaluate(self, source: Source) -> RuleViolation | None:
        """Evaluate source."""
        if not source.loader:
            return RuleViolation(message="Source lacks a loader.")

The Evaluation handler is then responsible for applying source-rules to Source objects and model-rules to Model objects.


closes #76

@otosky
Copy link
Contributor Author

otosky commented Oct 11, 2024

woops thought that I opened this PR against my own fork - will move out of draft when I've added the rest of the feature 🙂

@matthieucan
Copy link
Contributor

woops thought that I opened this PR against my own fork - will move out of draft when I've added the rest of the feature 🙂

No problem, sounds good!

src/dbt_score/rule.py Outdated Show resolved Hide resolved
@otosky otosky marked this pull request as ready for review October 15, 2024 02:45
@otosky
Copy link
Contributor Author

otosky commented Oct 15, 2024

@matthieucan - let me know if you think this is heading in the right direction

Copy link
Contributor

@jochemvandooren jochemvandooren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks very good already! 🚀 Just a couple of things we need to think about:

  • Filters cannot be applied to sources now
  • Code has model references almost everywhere (oops 😅, that was me probably), we should change that. E.g. scorer.py even has methods like score_model that will now be applied to sources as well.
  • Can we distinguish source and model in the formatters?
  • Needs some documentation in /docs!

Let me know what you think about it, I am happy to help if needed!

src/dbt_score/evaluation.py Outdated Show resolved Hide resolved
src/dbt_score/evaluation.py Outdated Show resolved Hide resolved
src/dbt_score/evaluation.py Outdated Show resolved Hide resolved
src/dbt_score/__init__.py Outdated Show resolved Hide resolved
@otosky
Copy link
Contributor Author

otosky commented Oct 17, 2024

  • Can we distinguish source and model in the formatters?

@jochemvandooren

For the human-readable formatter, my thought was to prefix models with M: and sources with S:, but let me know if you have a different idea.

Example:

🥇 M:model1 (score: 10.0)
    OK   tests.conftest.rule_severity_low
    ERR  tests.conftest.rule_severity_medium: Oh noes
    WARN (critical) tests.conftest.rule_severity_critical: Error

🥇 S:source1.table1 (score: 10.0) 
    OK   tests.conftest.rule_severity_low
    ERR  tests.conftest.rule_severity_medium: Oh noes
    WARN (critical) tests.conftest.rule_severity_critical: Error

@jochemvandooren
Copy link
Contributor

  • Can we distinguish source and model in the formatters?

@jochemvandooren

For the human-readable formatter, my thought was to prefix models with M: and sources with S:, but let me know if you have a different idea.

Example:

🥇 M:model1 (score: 10.0)
    OK   tests.conftest.rule_severity_low
    ERR  tests.conftest.rule_severity_medium: Oh noes
    WARN (critical) tests.conftest.rule_severity_critical: Error

🥇 S:source1.table1 (score: 10.0) 
    OK   tests.conftest.rule_severity_low
    ERR  tests.conftest.rule_severity_medium: Oh noes
    WARN (critical) tests.conftest.rule_severity_critical: Error

I like the idea, keeps it concise! 👍

Copy link
Contributor

@jochemvandooren jochemvandooren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has become a huge PR with all the renaming and stuff, great effort 🙌 I have played around with the code locally and everything works perfectly! Just left some comments about getting rid of all mentions of model in the code.

Also, all of the documentation needs to be updated as well, I am happy to assist with this one, please let me know!

pyproject.toml Outdated Show resolved Hide resolved
src/dbt_score/evaluation.py Outdated Show resolved Hide resolved
src/dbt_score/rule.py Outdated Show resolved Hide resolved
src/dbt_score/rule_filter.py Outdated Show resolved Hide resolved
src/dbt_score/scoring.py Outdated Show resolved Hide resolved
src/dbt_score/cli.py Outdated Show resolved Hide resolved
src/dbt_score/models.py Show resolved Hide resolved
src/dbt_score/models.py Show resolved Hide resolved
src/dbt_score/models.py Show resolved Hide resolved
src/dbt_score/rule_filter.py Show resolved Hide resolved
src/dbt_score/cli.py Outdated Show resolved Hide resolved
@otosky
Copy link
Contributor Author

otosky commented Oct 22, 2024

Thanks for a thorough review @jochemvandooren ! Will address remaining feedback, update the formatters, and take a first stab at some of the docs shortly! I will definitely take up your offer for support on the docs.

Copy link
Contributor

@matthieucan matthieucan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredible work @otosky !
I played a bit with it and it works fine! Very much looking forward to have this 💪

src/dbt_score/models.py Show resolved Hide resolved
docs/create_rules.md Outdated Show resolved Hide resolved
@jochemvandooren
Copy link
Contributor

Thanks for a thorough review @jochemvandooren ! Will address remaining feedback, update the formatters, and take a first stab at some of the docs shortly! I will definitely take up your offer for support on the docs.

Next week I will help you on the docs and further review the PR, thanks for all the amazing work already 🙌

@jochemvandooren
Copy link
Contributor

Thanks for a thorough review @jochemvandooren ! Will address remaining feedback, update the formatters, and take a first stab at some of the docs shortly! I will definitely take up your offer for support on the docs.

Next week I will help you on the docs and further review the PR, thanks for all the amazing work already 🙌

@otosky The code looks perfect! As promised I did a final check and tried to find all occurrences of model and replaced it by something more appropriate 🔍 d595ea2

I also added a CHANGELOG.md entry already, feel free to improve of course. Final thing that's remaining is some linting errors on lines exceeding line length. Once those are fixed I am ready to approve 🚀

@otosky
Copy link
Contributor Author

otosky commented Oct 31, 2024

Thank you @jochemvandooren!

Fixed the line lengths in a2c0924 and made some tweaks to the changelog in f88aa7c.

One final Q: I made use of two functions from more-itertools - first and first_true. more-itertools came as a sub-dependency of dbt, which I see has now been made a dev-dep. Would you like me to rewrite/vendor the 2 functions I used above? Or add more-itertools as proper dependency?

@jochemvandooren
Copy link
Contributor

jochemvandooren commented Oct 31, 2024

Thank you @jochemvandooren!

Fixed the line lengths in a2c0924 and made some tweaks to the changelog in f88aa7c.

One final Q: I made use of two functions from more-itertools - first and first_true. more-itertools came as a sub-dependency of dbt, which I see has now been made a dev-dep. Would you like me to rewrite/vendor the 2 functions I used above? Or add more-itertools as proper dependency?

Ah, good point! If it can be prevented easily, I'd like to keep the number of dependencies low. But I can imagine having to rewrite the function is a hassle, I will leave it up to you!

@otosky
Copy link
Contributor Author

otosky commented Oct 31, 2024

I went the vendoring route, since it's really just the function first_true that is nicer syntax sugar and used in more than one place. 👍

@matthieucan
Copy link
Contributor

I went the vendoring route, since it's really just the function first_true that is nicer syntax sugar and used in more than one place. 👍

I believe more_itertools.py was not committed?

@otosky
Copy link
Contributor Author

otosky commented Oct 31, 2024

I went the vendoring route, since it's really just the function first_true that is nicer syntax sugar and used in more than one place. 👍

I believe more_itertools.py was not committed?

🤦 totally right - just pushed it up!

Copy link
Contributor

@jochemvandooren jochemvandooren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing 🤩 , just needs a rebase on master!

Copy link
Contributor

@matthieucan matthieucan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpicks.

Incredible work @otosky, I'm really keen to start using this feature. Thanks a lot! 💪

CHANGELOG.md Outdated Show resolved Hide resolved
src/dbt_score/rule_filter.py Show resolved Hide resolved
manifest["nodes"][model_id]["meta"]["score"] = model_score.value
manifest["nodes"][model_id]["meta"]["badge"] = model_score.badge
for evaluable_id, evaluable_score in self._evaluable_scores.items():
manifest["nodes"][evaluable_id]["meta"]["score"] = evaluable_score.value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticing after taking another pass at the diff that sources need to be pushed into manifest["sources"] instead of manifest["nodes"]. Let me fix that quickly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good one, we should have a test for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in cc6b707

@jochemvandooren
Copy link
Contributor

It seems mypy was upgraded when rebasing, and it introduced some new linting errors 😅 I can help you fix them if you would like! Let me know

@otosky
Copy link
Contributor Author

otosky commented Nov 8, 2024

@jochemvandooren I'll take an initial stab at it!

@otosky
Copy link
Contributor Author

otosky commented Nov 12, 2024

There are still a bunch of Liskov violations being raised from mypy that I'm not entirely sure how to fix without having to introduce Generics into the API.

tests/test_rule.py:53: error: Argument 1 of "evaluate" is incompatible with supertype "Rule"; supertype defines the argument type as "Model | Source"  [override]
tests/test_rule.py:53: note: This violates the Liskov substitution principle
tests/test_rule.py:53: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#incompatible-overrides

Is there any other workaround besides adding an ignore here? This essentially means that mypy will fail for any downstream users if they use the class-based workflow while developing their rules.

@jochemvandooren
Copy link
Contributor

There are still a bunch of Liskov violations being raised from mypy that I'm not entirely sure how to fix without having to introduce Generics into the API.

tests/test_rule.py:53: error: Argument 1 of "evaluate" is incompatible with supertype "Rule"; supertype defines the argument type as "Model | Source"  [override]
tests/test_rule.py:53: note: This violates the Liskov substitution principle
tests/test_rule.py:53: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#incompatible-overrides

Is there any other workaround besides adding an ignore here? This essentially means that mypy will fail for any downstream users if they use the class-based workflow while developing their rules.

I also spent some time looking into this, and there's no easy solution indeed 😞 Considering this will only affect the class-based rules, I suggest we add an ignore for now. I am aware it will introduce the warnings downstream, which isn't ideal. To solve this in a nice way, we might need to restructure some things if we want to keep the API as is, so we might consider this in a follow-up PR. What do you think? I think it's the most pragmatic option!

@otosky
Copy link
Contributor Author

otosky commented Nov 12, 2024

I also spent some time looking into this, and there's no easy solution indeed 😞 Considering this will only affect the class-based rules, I suggest we add an ignore for now. I am aware it will introduce the warnings downstream, which isn't ideal. To solve this in a nice way, we might need to restructure some things if we want to keep the API as is, so we might consider this in a follow-up PR. What do you think? I think it's the most pragmatic option!

makes sense to me @jochemvandooren! - updated in 0732e15

@jochemvandooren jochemvandooren merged commit 8aa8aad into PicnicSupermarket:master Nov 12, 2024
3 checks passed
@jochemvandooren
Copy link
Contributor

Well done @otosky! Great contribution 🙌

@jochemvandooren
Copy link
Contributor

Also it's available in version 0.8.0 now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

feature request: enable linting of sources
3 participants