-
Notifications
You must be signed in to change notification settings - Fork 147
Description
Describe the bug
TLDR; stop using create or replace and then insert when table when table already exists (with valid data)
Two problems since materialization V2 for "table":
- The table materialization does 2 steps (create or replace + insert) that can lead to an unavailability of the data
- Constraints can be applied before hands (thanks a lot!) but the problem is that if new data arrive violating the said constraints afterward, we recreate the table first, hence removing all data.
The problem is that Delta is not handling transactions across multiple commands... One way to do overcome the second problem would be to add a clause for putting back data using RESTORE but it leaves the first problem.
Then, I thought of using the insert_overwrite incremental strategy that would replace all data in a single transaction, hence not removing existing valid data in a violation constraint scenario but the strategy isn't complete yet (not possible to add columns if I read right) and unavailable for SQL warehouse (even though I don't understand why because here would be a valid case for overwriting completely the table, maybe should have been only a warning in the logs?).
I think using insert_overwrite (without partition predicate) in the table materialization when the relation already exists in the database would be an idea but this may lead to complications (what if a column type changes ?). Happy to discuss !
Steps To Reproduce
Create a model toto.sql :
{{
config(
materialized = 'table',
)
}}
{% if load_relation(this) %} {# meaning it exists in the database #}
select null as name
union all
select 'toto' as name
{% else %} {# meaning it does not exist in the database #}
select 'toto' as name
{% endif %}
And its corresponding toto.yml
models:
- name: toto
config:
contract:
enforced: true
columns:
- name: name
data_type: string
constraints:
- type: not_null
Then run :
dbt run -s toto
--> Succeed + select * from my_catalog.my_schema.toto
returns 1 row
dbt run -s toto
--> Fails + select * from my_catalog.my_schema.toto
returns 0 row
Expected behavior
I want my valid data not to be removed and a 0 downtime of my table.
System information
The output of dbt --version
:
Core:
- installed: 1.10.2
- latest: 1.10.5 - Update available!
Your version of dbt-core is out of date!
You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation
Plugins:
- databricks: 1.10.4 - Up to date!
- spark: 1.9.2 - Update available!
At least one plugin is out of date with dbt-core.
You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation
The operating system you're using:
Python 3.13.5
Additional context
Thanks everyone for all the good work !! 🙏 I would be glad to contribute once we decide how to handle this problem ! Don't hesitate to reach out, we can even schedule a call if you have time for this.