Skip to content

Commit

Permalink
Clarify incremental docs (#2904)
Browse files Browse the repository at this point in the history
* clarify full refresh considerations

* more tips on identifying incremental models

* docs on the regular dbt dag task
  • Loading branch information
lauriemerrell authored Aug 21, 2023
1 parent 9f50d53 commit ae84fbc
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 2 deletions.
2 changes: 1 addition & 1 deletion airflow/dags/transform_warehouse/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ This DAG has some special considerations:

* Because the tasks in this DAG involve running a large volume of SQL transformations, they risk triggering data quotas if the DAG is run multiple times in a single day.

* This task can be run with a `dbt_select` statement provided (use the `Trigger DAG w/ config` button in the Airflow UI and provide a JSON configuration like `{"dbt_select": "+<your_model_here>+"}` using [dbt selection syntax](https://docs.getdbt.com/reference/node-selection/syntax#specifying-resources)) to re-run a specific individual model's lineage.
* This task can be run with a `dbt_select` statement provided (use the `Trigger DAG w/ config` button (option under the "play" icon in the upper right corner when looking at an individual DAG) in the Airflow UI and provide a JSON configuration like `{"dbt_select": "<+ if you want to run parents><your_model_here><+ if you want to run children>"}` using [dbt selection syntax](https://docs.getdbt.com/reference/node-selection/syntax#specifying-resources)) to re-run a specific individual model's lineage.
4 changes: 3 additions & 1 deletion airflow/dags/transform_warehouse_full_refresh/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Type: [Now / Ad-Hoc](https://docs.calitp.org/data-infra/airflow/dags-maintenance

This DAG orchestrates the running of the Cal-ITP dbt project and deployment of associated artifacts like the [dbt docs site](https://dbt-docs.calitp.org/#!/overview) with the [`--full-refresh` flag set](https://docs.getdbt.com/docs/build/incremental-models#how-do-i-rebuild-an-incremental-model) so that incremental models will be rebuilt from scratch.

**This task should generally only be run with a `dbt_select` statement provided (use the `Trigger DAG w/ config` button in the Airflow UI and provide a JSON configuration like `{"dbt_select": "+<your_model_here>+"} using [dbt selection syntax](https://docs.getdbt.com/reference/node-selection/syntax#specifying-resources)`). If you run this DAG without any selection criteria specified, you may need to increase the BigQuery quota for the project; refreshing all the GTFS-RT models uses up to ~60 TB as of 7/26/23.**
**This task should generally only be run with a `dbt_select` statement provided:** (use the `Trigger DAG w/ config` button in the Airflow UI (option under the "play" icon in the upper right corner when looking at an individual DAG) and provide a JSON configuration like `{"dbt_select": "<+ if you want to full refresh parents><your_model_here><+ if you want to full refresh children>"}` using [dbt selection syntax](https://docs.getdbt.com/reference/node-selection/syntax#specifying-resources)). **Note that all models selected by the selection syntax will be full-refreshed! Check all the incremental models selected by your syntax using the following from the command line in the repo's warehouse folder `poetry run dbt ls -s <your select statement here>,config.materialized:incremental --resource-type model`.**

If you need to run this DAG without any selection criteria specified (i.e., full refresh literally everything), you may need to increase the BigQuery quota for the project; refreshing all the GTFS-RT models uses up to ~60 TB as of 7/26/23.

See the [`transform_warehouse` README](../transform_warehouse/README.md) for general considerations for running the dbt DAGs.
4 changes: 4 additions & 0 deletions docs/warehouse/developing_dbt_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -387,6 +387,10 @@ You can compile the SQL for an incremental model and run it directly in BigQuery

Working with incremental models can affect how you approach various dbt-related workflows. See callouts in the individual step sections above related to incremental models for more details.

```{admonition} Identifying incremental models in your dependency tree
If you're trying to identify whether there are incremental models in the dependency tree of a model you're working with, you can use the following command (run from the `warehouse` directory in the data infra repo): `poetry run dbt ls -s +<your_model>+,config.materialized:incremental --resource-type model`.
```

## Helpful talks and presentations

### dbt at Cal-ITP introduction
Expand Down

0 comments on commit ae84fbc

Please sign in to comment.