Skip to content

Commit 8aa8aad

Browse files
otoskyjochemvandoorenmatthieucan
authored
feat: lint sources (#78)
# Overview Extends rules so that they can be used to against dbt [sources](https://docs.getdbt.com/docs/build/sources) in addition to models. # Usage A rule defines what resource-type it acts against in the type signature of the function it wraps or in a class-based `evaluate` method: ```python from dbt_score import Model, Source rule, Rule, RuleViolation # decorator-based # for a Model @rule def model_has_description(model: Model) -> RuleViolation | None: """A model should have a description.""" if not model.description: return RuleViolation(message="Model lacks a description.") # for a Source @rule def has_description(source: Source) -> RuleViolation | None: """A source should have a loader defined.""" if not source.loader: return RuleViolation(message="Source lacks a loader.") # class-based class ExampleSource(Rule): """Example class-based rule.""" description = "A source should have a loader defined." def evaluate(self, source: Source) -> RuleViolation | None: """Evaluate source.""" if not source.loader: return RuleViolation(message="Source lacks a loader.") ``` The `Evaluation` handler is then responsible for applying source-rules to Source objects and model-rules to Model objects. --- closes #76 --------- Co-authored-by: Jochem van Dooren <[email protected]> Co-authored-by: Jochem van Dooren <[email protected]> Co-authored-by: Matthieu Caneill <[email protected]>
1 parent b0bb6f3 commit 8aa8aad

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1452
-491
lines changed

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,16 @@ and this project adheres to
88

99
## [Unreleased]
1010

11+
- Support linting of sources.
12+
- **Breaking**: Renamed modules: `dbt_score.model_filter` becomes
13+
`dbt_score.rule_filter`
14+
- **Breaking**: Renamed filter class and decorator: `@model_filter` becomes
15+
`@rule_filter` and `ModelFilter` becomes `RuleFilter`.
16+
- **Breaking**: Config option `model_filter_names` becomes `rule_filter_names`.
17+
- **Breaking**: CLI flag naming fixes: `--fail_any_model_under` becomes
18+
`--fail-any-item-under` and `--fail_project_under` becomes
19+
`--fail-project-under`.
20+
1121
## [0.7.1] - 2024-11-01
1222

1323
- Fix mkdocs.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
## What is `dbt-score`?
1313

14-
`dbt-score` is a linter for dbt model metadata.
14+
`dbt-score` is a linter for dbt metadata.
1515

1616
[dbt][dbt] (Data Build Tool) is a great framework for creating, building,
1717
organizing, testing and documenting _data models_, i.e. data sets living in a

docs/configuration.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ rule_namespaces = ["dbt_score.rules", "dbt_score_rules", "custom_rules"]
1818
disabled_rules = ["dbt_score.rules.generic.columns_have_description"]
1919
inject_cwd_in_python_path = true
2020
fail_project_under = 7.5
21-
fail_any_model_under = 8.0
21+
fail_any_item_under = 8.0
2222

2323
[tool.dbt-score.badges]
2424
first.threshold = 10.0
@@ -51,8 +51,8 @@ The following options can be set in the `pyproject.toml` file:
5151
- `disabled_rules`: A list of rules to disable.
5252
- `fail_project_under` (default: `5.0`): If the project score is below this
5353
value the command will fail with return code 1.
54-
- `fail_any_model_under` (default: `5.0`): If any model scores below this value
55-
the command will fail with return code 1.
54+
- `fail_any_item_under` (default: `5.0`): If any model or source scores below
55+
this value the command will fail with return code 1.
5656

5757
#### Badges configuration
5858

@@ -70,7 +70,7 @@ All badges except `wip` can be configured with the following option:
7070

7171
- `threshold`: The threshold for the badge. A decimal number between `0.0` and
7272
`10.0` that will be used to compare to the score. The threshold is the minimum
73-
score required for a model to be rewarded with a certain badge.
73+
score required for a model or source to be rewarded with a certain badge.
7474

7575
The default values can be found in the
7676
[BadgeConfig](reference/config.md#dbt_score.config.BadgeConfig).
@@ -86,7 +86,7 @@ Every rule can be configured with the following option:
8686
- `severity`: The severity of the rule. Rules have a default severity and can be
8787
overridden. It's an integer with a minimum value of 1 and a maximum value
8888
of 4.
89-
- `model_filter_names`: Filters used by the rule. Takes a list of names that can
89+
- `rule_filter_names`: Filters used by the rule. Takes a list of names that can
9090
be found in the same namespace as the rules (see
9191
[Package rules](package_rules.md)).
9292

docs/create_rules.md

Lines changed: 56 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Create rules
22

3-
In order to lint and score models, `dbt-score` uses a set of rules that are
4-
applied to each model. A rule can pass or fail when it is run. Based on the
5-
severity of the rule, models are scored with the weighted average of the rules
6-
results. Note that `dbt-score` comes bundled with a
3+
In order to lint and score models or sources, `dbt-score` uses a set of rules
4+
that are applied to each item. A rule can pass or fail when it is run. Based on
5+
the severity of the rule, items are scored with the weighted average of the
6+
rules results. Note that `dbt-score` comes bundled with a
77
[set of default rules](rules/generic.md).
88

99
On top of the generic rules, it's possible to add your own rules. Two ways exist
@@ -21,7 +21,7 @@ The `@rule` decorator can be used to easily create a new rule:
2121
from dbt_score import Model, rule, RuleViolation
2222

2323
@rule
24-
def has_description(model: Model) -> RuleViolation | None:
24+
def model_has_description(model: Model) -> RuleViolation | None:
2525
"""A model should have a description."""
2626
if not model.description:
2727
return RuleViolation(message="Model lacks a description.")
@@ -31,6 +31,21 @@ The name of the function is the name of the rule and the docstring of the
3131
function is its description. Therefore, it is important to use a
3232
self-explanatory name for the function and document it well.
3333

34+
The type annotation for the rule's argument dictates whether the rule should be
35+
applied to dbt models or sources.
36+
37+
Here is the same example rule, applied to sources:
38+
39+
```python
40+
from dbt_score import rule, RuleViolation, Source
41+
42+
@rule
43+
def source_has_description(source: Source) -> RuleViolation | None:
44+
"""A source should have a description."""
45+
if not source.description:
46+
return RuleViolation(message="Source lacks a description.")
47+
```
48+
3449
The severity of a rule can be set using the `severity` argument:
3550

3651
```python
@@ -45,15 +60,23 @@ For more advanced use cases, a rule can be created by inheriting from the `Rule`
4560
class:
4661

4762
```python
48-
from dbt_score import Model, Rule, RuleViolation
63+
from dbt_score import Model, Rule, RuleViolation, Source
4964

50-
class HasDescription(Rule):
65+
class ModelHasDescription(Rule):
5166
description = "A model should have a description."
5267

5368
def evaluate(self, model: Model) -> RuleViolation | None:
5469
"""Evaluate the rule."""
5570
if not model.description:
5671
return RuleViolation(message="Model lacks a description.")
72+
73+
class SourceHasDescription(Rule):
74+
description = "A source should have a description."
75+
76+
def evaluate(self, source: Source) -> RuleViolation | None:
77+
"""Evaluate the rule."""
78+
if not source.description:
79+
return RuleViolation(message="Source lacks a description.")
5780
```
5881

5982
### Rules location
@@ -91,30 +114,48 @@ def sql_has_reasonable_number_of_lines(model: Model, max_lines: int = 200) -> Ru
91114
)
92115
```
93116

94-
### Filtering models
117+
### Filtering rules
95118

96-
Custom and standard rules can be configured to have model filters. Filters allow
97-
models to be ignored by one or multiple rules.
119+
Custom and standard rules can be configured to have filters. Filters allow
120+
models or sources to be ignored by one or multiple rules if the item doesn't
121+
satisfy the filter criteria.
98122

99123
Filters are created using the same discovery mechanism and interface as custom
100124
rules, except they do not accept parameters. Similar to Python's built-in
101-
`filter` function, when the filter evaluation returns `True` the model should be
125+
`filter` function, when the filter evaluation returns `True` the item should be
102126
evaluated, otherwise it should be ignored.
103127

104128
```python
105-
from dbt_score import ModelFilter, model_filter
129+
from dbt_score import Model, RuleFilter, rule_filter
106130

107-
@model_filter
131+
@rule_filter
108132
def only_schema_x(model: Model) -> bool:
109133
"""Only applies a rule to schema X."""
110134
return model.schema.lower() == 'x'
111135

112-
class SkipSchemaY(ModelFilter):
136+
class SkipSchemaY(RuleFilter):
113137
description = "Applies a rule to every schema but Y."
114138
def evaluate(self, model: Model) -> bool:
115139
return model.schema.lower() != 'y'
116140
```
117141

142+
Filters also rely on type-annotations to dictate whether they apply to models or
143+
sources:
144+
145+
```python
146+
from dbt_score import RuleFilter, rule_filter, Source
147+
148+
@rule_filter
149+
def only_from_source_a(source: Source) -> bool:
150+
"""Only applies a rule to source tables from source X."""
151+
return source.source_name.lower() == 'a'
152+
153+
class SkipSourceDatabaseB(RuleFilter):
154+
description = "Applies a rule to every source except Database B."
155+
def evaluate(self, source: Source) -> bool:
156+
return source.database.lower() != 'b'
157+
```
158+
118159
Similar to setting a rule severity, standard rules can have filters set in the
119160
[configuration file](configuration.md/#tooldbt-scorerulesrule_namespacerule_name),
120161
while custom rules accept the configuration file or a decorator parameter.
@@ -123,7 +164,7 @@ while custom rules accept the configuration file or a decorator parameter.
123164
from dbt_score import Model, rule, RuleViolation
124165
from my_project import only_schema_x
125166

126-
@rule(model_filters={only_schema_x()})
167+
@rule(rule_filters={only_schema_x()})
127168
def models_in_x_follow_naming_standard(model: Model) -> RuleViolation | None:
128169
"""Models in schema X must follow the naming standard."""
129170
if some_regex_fails(model.name):

docs/get_started.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,8 @@ It's also possible to automatically run `dbt parse`, to generate the
4040
dbt-score lint --run-dbt-parse
4141
```
4242

43-
To lint only a selection of models, the argument `--select` can be used. It
44-
accepts any
43+
To lint only a selection of models or sources, the argument `--select` can be
44+
used. It accepts any
4545
[dbt node selection syntax](https://docs.getdbt.com/reference/node-selection/syntax):
4646

4747
```shell

docs/index.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@
22

33
`dbt-score` is a linter for [dbt](https://www.getdbt.com/) metadata.
44

5-
dbt allows data practitioners to organize their data in to _models_. Those
6-
models have metadata associated with them: documentation, tests, types, etc.
5+
dbt allows data practitioners to organize their data in to _models_ and
6+
_sources_. Those models and sources have metadata associated with them:
7+
documentation, tests, types, etc.
78

89
`dbt-score` allows to lint and score this metadata, in order to enforce (or
910
encourage) good practices.
@@ -12,7 +13,7 @@ encourage) good practices.
1213

1314
```
1415
> dbt-score lint
15-
🥇 customers (score: 10.0)
16+
🥇 M: customers (score: 10.0)
1617
OK dbt_score.rules.generic.has_description
1718
OK dbt_score.rules.generic.has_owner
1819
OK dbt_score.rules.generic.sql_has_reasonable_number_of_lines
@@ -25,17 +26,17 @@ score.
2526

2627
## Philosophy
2728

28-
dbt models are often used as metadata containers: either in YAML files or
29-
through the use of `{{ config() }}` blocks, they are associated with a lot of
29+
dbt models/sources are often used as metadata containers: either in YAML files
30+
or through the use of `{{ config() }}` blocks, they are associated with a lot of
3031
information. At scale, it becomes tedious to enforce good practices in large
31-
data teams dealing with many models.
32+
data teams dealing with many models/sources.
3233

3334
To that end, `dbt-score` has 2 main features:
3435

35-
- It runs rules on models, and displays rule violations. Those can be used in
36-
interactive environments or in CI.
37-
- Using those run results, it scores models, as to give them a measure of their
38-
maturity. This score can help gamify model metadata improvements, and be
36+
- It runs rules on dbt models and sources, and displays any rule violations.
37+
These can be used in interactive environments or in CI.
38+
- Using those run results, it scores items, to ascribe them a measure of their
39+
maturity. This score can help gamify metadata improvements/coverage, and be
3940
reflected in data catalogs.
4041

4142
`dbt-score` aims to:

docs/programmatic_invocations.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,9 +61,9 @@ When `dbt-score` terminates, it exists with one of the following exit codes:
6161
project being linted either doesn't raise any warning, or the warnings are
6262
small enough to be above the thresholds. This generally means "successful
6363
linting".
64-
- `1` in case of linting errors. This is the unhappy case: some models in the
65-
project raise enough warnings to have a score below the defined thresholds.
66-
This generally means "linting doesn't pass".
64+
- `1` in case of linting errors. This is the unhappy case: some models or
65+
sources in the project raise enough warnings to have a score below the defined
66+
thresholds. This generally means "linting doesn't pass".
6767
- `2` in case of an unexpected error. This happens for example if something is
6868
misconfigured (for example a faulty dbt project), or the wrong parameters are
6969
given to the CLI. This generally means "setup needs to be fixed".

pyproject.toml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ build-backend = "pdm.backend"
66
name = "dbt-score"
77
dynamic = ["version"]
88

9-
description = "Linter for dbt model metadata."
9+
description = "Linter for dbt metadata."
1010
authors = [
1111
{name = "Picnic Analyst Development Platform", email = "[email protected]"}
1212
]
@@ -101,6 +101,7 @@ max-args = 9
101101
[tool.ruff.lint.per-file-ignores]
102102
"tests/**/*.py" = [
103103
"PLR2004", # Magic value comparisons
104+
"PLR0913", # Too many args in func def
104105
]
105106

106107
### Coverage ###
@@ -114,3 +115,7 @@ source = [
114115
[tool.coverage.report]
115116
show_missing = true
116117
fail_under = 80
118+
exclude_also = [
119+
"@overload"
120+
]
121+

src/dbt_score/__init__.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
11
"""Init dbt_score package."""
22

3-
from dbt_score.model_filter import ModelFilter, model_filter
4-
from dbt_score.models import Model
3+
from dbt_score.models import Model, Source
54
from dbt_score.rule import Rule, RuleViolation, Severity, rule
5+
from dbt_score.rule_filter import RuleFilter, rule_filter
66

77
__all__ = [
88
"Model",
9-
"ModelFilter",
9+
"Source",
10+
"RuleFilter",
1011
"Rule",
1112
"RuleViolation",
1213
"Severity",
13-
"model_filter",
14+
"rule_filter",
1415
"rule",
1516
]

src/dbt_score/cli.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -81,15 +81,15 @@ def cli() -> None:
8181
default=False,
8282
)
8383
@click.option(
84-
"--fail_project_under",
84+
"--fail-project-under",
8585
help="Fail if the project score is under this value.",
8686
type=float,
8787
is_flag=False,
8888
default=None,
8989
)
9090
@click.option(
91-
"--fail_any_model_under",
92-
help="Fail if any model is under this value.",
91+
"--fail-any-item-under",
92+
help="Fail if any evaluable item is under this value.",
9393
type=float,
9494
is_flag=False,
9595
default=None,
@@ -104,9 +104,9 @@ def lint(
104104
manifest: Path,
105105
run_dbt_parse: bool,
106106
fail_project_under: float,
107-
fail_any_model_under: float,
107+
fail_any_item_under: float,
108108
) -> None:
109-
"""Lint dbt models metadata."""
109+
"""Lint dbt metadata."""
110110
manifest_provided = (
111111
click.get_current_context().get_parameter_source("manifest")
112112
!= ParameterSource.DEFAULT
@@ -122,8 +122,8 @@ def lint(
122122
config.overload({"disabled_rules": disabled_rule})
123123
if fail_project_under:
124124
config.overload({"fail_project_under": fail_project_under})
125-
if fail_any_model_under:
126-
config.overload({"fail_any_model_under": fail_any_model_under})
125+
if fail_any_item_under:
126+
config.overload({"fail_any_item_under": fail_any_item_under})
127127

128128
try:
129129
if run_dbt_parse:
@@ -148,7 +148,7 @@ def lint(
148148
ctx.exit(2)
149149

150150
if (
151-
any(x.value < config.fail_any_model_under for x in evaluation.scores.values())
151+
any(x.value < config.fail_any_item_under for x in evaluation.scores.values())
152152
or evaluation.project_score.value < config.fail_project_under
153153
):
154154
ctx.exit(1)

0 commit comments

Comments
 (0)