Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic dbt objects and rule definitions #6

Merged
merged 27 commits into from
Mar 25, 2024

Conversation

jochemvandooren
Copy link
Contributor

@jochemvandooren jochemvandooren commented Mar 8, 2024

This PR takes care of the following:

Add basic rule structures

A rule can be added by definining a method with the @rule decorator or by creating a class that inherits from Rule.

Add a way to load models from the manifest

ManifestLoader takes as input the manifest.json, loaded as a dictionary. It will created a set of Model. A Model is the input of Rule.evaluate.

Below is a minimal example to load a manifest.json file and apply rules to the models in the manifest.

"""Parse dbt manifest."""

import json
from pathlib import Path
from typing import Any

from dbt_score.models import ManifestLoader
from dbt_score.rules.example_rules import (
    ComplexRule,
    columns_have_description,
    has_owner,
    has_primary_key,
    has_test,
    primary_key_has_uniqueness_test,
)

def get_json(json_filename: str) -> Any:
    """Get JSON from a file."""
    file_content = Path(json_filename).read_text(encoding="utf-8")
    return json.loads(file_content)

raw_manifest = get_json(
    "target/manifest.json"
)
loader = ManifestLoader(raw_manifest)
models = loader.models

rule_classes = [columns_have_description,
                has_owner,
                has_primary_key,
                has_test,
                primary_key_has_uniqueness_test,
                ComplexRule]

for model in models:
    print("Running model...", model.unique_id)
    for rule_class in rule_classes:
        print("Running rule...", rule_class.__name__)
        print("With properties...", rule_class.description, rule_class.severity)
        rule_instance = rule_class()
        result = rule_instance.evaluate(model)
        print("Result:", result)
        print()

To do

  • Write tests
  • Decide which example rules should be default rules

@jochemvandooren jochemvandooren self-assigned this Mar 8, 2024
@jochemvandooren jochemvandooren marked this pull request as ready for review March 13, 2024 13:55
@jochemvandooren jochemvandooren changed the title WIP Add basic dbt objects and rule definitions Add basic dbt objects and rule definitions Mar 13, 2024
Copy link
Contributor

@matthieucan matthieucan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bunch of comments, but this looks very promising! 😍

src/dbt_score/models.py Outdated Show resolved Hide resolved
src/dbt_score/models.py Outdated Show resolved Hide resolved
src/dbt_score/models.py Outdated Show resolved Hide resolved
src/dbt_score/models.py Outdated Show resolved Hide resolved
src/dbt_score/models.py Outdated Show resolved Hide resolved
src/dbt_score/models.py Outdated Show resolved Hide resolved
src/dbt_score/rule.py Show resolved Hide resolved
src/dbt_score/rule.py Outdated Show resolved Hide resolved
src/dbt_score/rule.py Outdated Show resolved Hide resolved
Copy link
Contributor

@sercancicek sercancicek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really looks great! 💯 I also left a couple of comments/ideas.

src/dbt_score/models.py Outdated Show resolved Hide resolved
src/dbt_score/models.py Outdated Show resolved Hide resolved
src/dbt_score/models.py Outdated Show resolved Hide resolved
src/dbt_score/models.py Outdated Show resolved Hide resolved
src/dbt_score/models.py Outdated Show resolved Hide resolved
src/dbt_score/models.py Outdated Show resolved Hide resolved
Copy link
Contributor

@matthieucan matthieucan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, congrats on the @rule implementation 😍

Thoughts on unit tests already?

src/dbt_score/models.py Outdated Show resolved Hide resolved
src/dbt_score/models.py Outdated Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
Copy link
Contributor

@michael-the1 michael-the1 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure you've thought a lot about this setup and I'd also like to learn more so here goes:

What is the reason @rule returns a subclass of Rule, rather than a Rule object? That is, instead of:

Callable[..., Type[Rule]]

Why not:

Callable[..., Rule]

Generating subclasses, of which there will only be one object per subclass, feels similar to generating objects, i.e. specific instances of Rule, but the latter is semantically closer to what you'd want (I think).

rule would then look something like this:

def rule(
    description: str | None = None,
    severity: Severity = Severity.MEDIUM,
) -> Callable[[Callable[[Model], RuleViolation | None]], Rule]:
    """Rule decorator.
    The rule decorator creates a Rule and returns it.
    Args:
        description: The description of the rule.
        severity: The severity of the rule.
    """

    def decorator_rule(
        func: Callable[[Model], RuleViolation | None],
    ) -> Rule:
        """Decorator function."""

        ...

        def wrapped_func(self: Rule, *args: Any, **kwargs: Any) -> RuleViolation | None:
            """Wrap func to add `self`."""
            return func(*args, **kwargs)

        rule_obj = Rule(
            description=...,
            severity=...,
            evaluate=wrapped_func,
        )

        return rule_obj

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons is to allow users to create more advanced rules by subclassing Rule (simpler rules are encouraged to use the diet @rule decorator). Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is indeed very similar, but this way it will be easier for users to create their own complex rules with a class. So this gives full flexibility for users that need it

Copy link
Contributor

@michael-the1 michael-the1 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But can't they still do that even in this setup?

@rule
def some_simple_rule():
    ...

class ComplexRule(Rule):
    """Do the thing, as long as it implements `evaluate`"""
    ...

Then in the rule registry / runner:

# In the real registry this discovery happens differently of course
rules: Rule = [
    some_simple_rule(),  # Instantiate a Rule object
    ComplexRule(),  # Instantiate a ComplexRule object, which is a subclass of Rule
    ...
]

for rule in rules:
    rule.evaluate(model)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snippet above is how it currently works. But with your suggestion of creating instances instead of classes in the decorator, some_simple_rule would already be an instance, therefore some_simple_rule() would not work, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I missed the last return 🙈

def rule(..) -> Callable[[Callable[[Model], RuleViolation | None]], Rule]:
    def decorator_rule(
        func: Callable[[Model], RuleViolation | None],
    ) -> Rule:
        def wrapped_func(self: Rule, *args: Any, **kwargs: Any) -> RuleViolation | None:
            return func(*args, **kwargs)

        rule_obj = Rule()

        return rule_obj
    return decorator_rule

Which should result in:

@rule
def some_simple_rule():
    ...

some_simple_rule # Function object that returns a Rule object
some_simple_rule() # Instance of Rule

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. But I think it's more straightforward to go with the approach: "every subclass of Rule is a rule". Such subclass can be created either by subclassing, or by using @rule as syntactic sugar. This puts all rules "at the same level", if that makes sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it, both approaches are the same in the sense that "every Rule is a rule".

I understand that, by always subclassing Rule, all rules are on the same hierarchy. But from another perspective, they're all Rules and in aggregate we will work with lists of Rules. It doesn't matter if you subclass a Rule or use a subclass of a subclass of Rule or use Rule itself.

From a user perspective, nothing will change. You either use @rule as syntactic sugar or subclass Rule.

Where they differ is that I think, semantically, a list of objects is more straightforward than a list of generated subclasses. And I also think it simplifies implementation a bit.

Btw, because nothing changes from a user's perspective, I wouldn't make this point blocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have given it another look and decided to go with the current implementation. Going with the instance implementation also has some challenges, e.g. the example

@rule
def some_simple_rule():
    ...

some_simple_rule # Function object that returns a Rule object
some_simple_rule() # Instance of Rule

will not exactly work. To call some_simple_rule() we will need to grab the func from somewhere again as it's an input variable. I think the implementation does not necessarily becomes easier when using instances instead of classes.

Copy link
Contributor

@matthieucan matthieucan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments - near perfect!

src/dbt_score/models.py Show resolved Hide resolved
src/dbt_score/rules/rules.py Outdated Show resolved Hide resolved
src/dbt_score/rules/rules.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/test_models.py Outdated Show resolved Hide resolved
Copy link
Contributor

@matthieucan matthieucan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the return None question is pending, and the discussion about classes/instances is ongoing, happy to approve this design. Great work!

tests/test_rule.py Outdated Show resolved Hide resolved
src/dbt_score/rule.py Outdated Show resolved Hide resolved
@jochemvandooren jochemvandooren merged commit 147cf6a into master Mar 25, 2024
2 checks passed
@jochemvandooren jochemvandooren deleted the jvandooren/rule-definitions branch March 25, 2024 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants