You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/readme.md
+38-1
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,10 @@
3
3
1. Place your dbt `manifest.json` and `catalog.json` files in the `inputs` directory.
4
4
2.**Customization**:
5
5
- Set your dialect (only tested with `snowflake` so far) in the `main_step_1_direct.py` script.
6
-
- You can specify the scope of the models you want to extract column lineage for by adding them to the `li_selected_model` list, or leave it empty to process all models (recommended).
6
+
- You can specify the scope of the models you want to extract column lineage for by adding them to the `li_selected_model` list, or leave it empty to process all models.
7
+
- When specifying models, you can use dbt-style selectors like `+model_name` (ancestors), `model_name+` (descendants), `+model_name+` (entire lineage), `tag:my_tag` (tag filtering), etc.
8
+
- Both models and sources are supported in selectors (e.g., `source.schema.table`).
9
+
- Alternatively, you can create a JSON file with a list of models and use the `--model-list-json` parameter when running the CLI.
7
10
8
11
3. Run the `main_step_1_direct.py` script to extract direct column lineage:
9
12
```bash
@@ -14,6 +17,40 @@
14
17
-`lineage_to_direct_parents.json`
15
18
-`lineage_to_direct_children.json`
16
19
20
+
#### Model Selection with dbt-style Syntax
21
+
22
+
When specifying models using Python code, you can use dbt-style selectors just like in the CLI:
23
+
24
+
```python
25
+
# Example model selectors
26
+
li_selected_model = [
27
+
# Include orders and all its ancestors
28
+
"+orders",
29
+
30
+
# Include all models with "finance" tag
31
+
"tag:finance",
32
+
33
+
# Include models that are both daily-tagged AND in the core package
34
+
"tag:daily,package:core",
35
+
36
+
# Include a specific source
37
+
"source.raw.customers",
38
+
39
+
# Include a source and all its downstream dependencies
40
+
"source.raw.orders+",
41
+
42
+
# Get the entire lineage (upstream and downstream) of a source
43
+
"+source.raw.payments+"
44
+
]
45
+
46
+
extractor = DbtColumnLineageExtractor(
47
+
manifest_path="./inputs/manifest.json",
48
+
catalog_path="./inputs/catalog.json",
49
+
selected_models=li_selected_model,
50
+
dialect="snowflake"
51
+
)
52
+
```
53
+
17
54
#### Analyze Recursive Column Lineage
18
55
19
56
1. With the output from the direct column lineage step, run the `main_step_2_recursive.py` script to analyze recursive column lineage:
Copy file name to clipboardExpand all lines: py_package/dbt_column_lineage_extractor/cli_direct.py
+76-28
Original file line number
Diff line number
Diff line change
@@ -7,44 +7,92 @@ def main():
7
7
parser.add_argument('--manifest', default='./inputs/manifest.json', help='Path to the manifest.json file, default to ./inputs/manifest.json')
8
8
parser.add_argument('--catalog', default='./inputs/catalog.json', help='Path to the catalog.json file, default to ./inputs/catalog.json')
9
9
parser.add_argument('--dialect', default='snowflake', help='SQL dialect to use, default is snowflake, more dialects at https://github.com/tobymao/sqlglot/tree/v25.24.5/sqlglot/dialects')
10
-
parser.add_argument('--model', nargs='*', default=[], help='List of models to extract lineage for, default to all models')
10
+
parser.add_argument(
11
+
'--model',
12
+
nargs='*',
13
+
default=[],
14
+
help='''List of models to extract lineage for using dbt-style selectors:
15
+
- Simple model names: model_name
16
+
- Include ancestors: +model_name (include upstream/parent models)
17
+
- Include descendants: model_name+ (include downstream/child models)
18
+
- Union (either): "model1 model2" (models matching either selector)
19
+
- Intersection (both): "model1,model2" (models matching both selectors)
20
+
- Tag filtering: tag:my_tag (models with specific tag)
21
+
- Path filtering: path:models/finance (models in specific path)
22
+
- Package filtering: package:my_package (models in specific package)
23
+
Default behavior extracts lineage for all models.'''
24
+
)
25
+
parser.add_argument('--model-list-json', help='Path to a JSON file containing a list of models to extract lineage for. If specified, this takes precedence over --model')
11
26
parser.add_argument('--output-dir', default='./outputs', help='Directory to write output json files, default to ./outputs')
12
27
parser.add_argument('--show-ui', action='store_true', help='Flag to show lineage outputs in the console')
28
+
parser.add_argument('--continue-on-error', action='store_true', help='Continue processing even if some models fail')
0 commit comments