Run dbt with DuckDB and automatically convert your tables to Apache Iceberg format on Azure Storage. No more manual conversion scripts or complex workflows.
uv run dbt_iceberg.py run {any dbt flags or commands here}
This project solves a specific problem: dbt-duckdb can't write Iceberg tables directly. So we built a wrapper that:
- Runs your dbt models (which output Parquet files)
- Automatically converts them to Iceberg tables
- Cleans up the temporary files
- Handles all the Azure authentication headaches
- Python 3.11+
- Azure Storage Account
- Service Principal with "Storage Blob Data Contributor" role
git clone <repository-url>
cd duckdb_iceberg
# Using uv (faster)
pip install uv
uv sync
# Or regular pip
pip install -r requirements.txt
Create .env
file from .env.example
:
AZURE_STORAGE_ACCOUNT=your_storage_account
AZURE_CONTAINER=your_container
AZURE_TENANT_ID=your_tenant_id
AZURE_CLIENT_ID=your_client_id
AZURE_CLIENT_SECRET=your_client_secret
# Run everything
uv run python dbt_iceberg.py run
# Run specific models
uv run python dbt_iceberg.py run --select customer_metrics
# See example strategies in action
uv run python dbt_iceberg.py run --select tag:example
You control how data gets loaded into Iceberg tables using three strategies:
Drop the table and recreate it. Simple and clean.
models:
- name: dim_customers
config:
iceberg_strategy: replace # or just omit it
Update existing rows, insert new ones. Perfect for slowly changing dimensions.
models:
- name: dim_products
config:
iceberg_strategy: merge
iceberg_merge_keys: [product_id, is_current]
Just add new rows. Great for fact tables and event logs.
models:
- name: fact_sales
config:
iceberg_strategy: append
iceberg_partition_spec: [sale_year, sale_month]
Need to recreate all tables with fresh schemas? Use --full-refresh
:
# Forces all tables to use replace strategy
uv run python dbt_iceberg.py run --full-refresh
This is useful when:
- You've changed column types in your models
- You want to fix schema mismatches
- You need a clean slate
The flag overrides configured strategies - even tables set to merge
or append
will be dropped and recreated.
duckdb_iceberg/
├── dbt_iceberg.py # The magic wrapper
├── models/
│ ├── staging/ # Transform raw data
│ ├── marts/ # Business logic (external tables)
│ └── examples/ # See strategies in action
├── utilities/
│ └── iceberg_utils.py # Azure auth fix + conversion logic
└── data/ # Sample data
- dbt creates Parquet files in Azure at
/temp/dbt/
- Script finds all external tables from dbt's manifest
- Each Parquet file gets converted to an Iceberg table
- Temp files get deleted (unless something fails)
Your data ends up in:
/raw/
- Seed data from dbt/temp/dbt/
- Temporary Parquet files (auto-cleaned)/iceberg-warehouse/
- Final Iceberg tables
PyIceberg has a bug where service principal auth doesn't work for writes. We built a custom FileIO class that fixes it:
# This is in utilities/iceberg_utils.py
class AzureServicePrincipalFileIO(FsspecFileIO):
# Properly passes Azure creds to the filesystem
No more "authentication failed" errors when creating tables.
Check out the models/examples/
folder to see all three strategies in action:
dim_customers_latest.sql
- Replace strategydim_products_scd.sql
- Merge with history trackingfact_daily_sales.sql
- Append with partitioning
Auth errors? Make sure your service principal has "Storage Blob Data Contributor" role.
Container not found? Create it first. The script won't create containers for you.
Weird Python exit errors? Those are from the adlfs library. They're harmless and happen after all the work is done. On Windows, hide them with:
uv run python dbt_iceberg.py run 2>&1 | findstr /V "TypeError.*await"
MIT