dbt-duckdb + Apache Iceberg on Azure

Run dbt with DuckDB and automatically convert your tables to Apache Iceberg format on Azure Storage. No more manual conversion scripts or complex workflows.

uv run dbt_iceberg.py run {any dbt flags or commands here}

What This Does

This project solves a specific problem: dbt-duckdb can't write Iceberg tables directly. So we built a wrapper that:

Runs your dbt models (which output Parquet files)
Automatically converts them to Iceberg tables
Cleans up the temporary files
Handles all the Azure authentication headaches

Setup

Prerequisites

Python 3.11+
Azure Storage Account
Service Principal with "Storage Blob Data Contributor" role

Install

git clone <repository-url>
cd duckdb_iceberg

# Using uv (faster)
pip install uv
uv sync

# Or regular pip
pip install -r requirements.txt

Configure

Create .env file from .env.example:

AZURE_STORAGE_ACCOUNT=your_storage_account
AZURE_CONTAINER=your_container
AZURE_TENANT_ID=your_tenant_id
AZURE_CLIENT_ID=your_client_id
AZURE_CLIENT_SECRET=your_client_secret

Run

# Run everything
uv run python dbt_iceberg.py run

# Run specific models
uv run python dbt_iceberg.py run --select customer_metrics

# See example strategies in action
uv run python dbt_iceberg.py run --select tag:example

How Tables Get Loaded

You control how data gets loaded into Iceberg tables using three strategies:

1. Replace (Default)

Drop the table and recreate it. Simple and clean.

models:
  - name: dim_customers
    config:
      iceberg_strategy: replace  # or just omit it

2. Merge (Upsert)

Update existing rows, insert new ones. Perfect for slowly changing dimensions.

models:
  - name: dim_products
    config:
      iceberg_strategy: merge
      iceberg_merge_keys: [product_id, is_current]

3. Append

Just add new rows. Great for fact tables and event logs.

models:
  - name: fact_sales
    config:
      iceberg_strategy: append
      iceberg_partition_spec: [sale_year, sale_month]

Full Refresh

Need to recreate all tables with fresh schemas? Use --full-refresh:

# Forces all tables to use replace strategy
uv run python dbt_iceberg.py run --full-refresh

This is useful when:

You've changed column types in your models
You want to fix schema mismatches
You need a clean slate

The flag overrides configured strategies - even tables set to merge or append will be dropped and recreated.

Project Structure

duckdb_iceberg/
├── dbt_iceberg.py          # The magic wrapper
├── models/
│   ├── staging/            # Transform raw data
│   ├── marts/              # Business logic (external tables)
│   └── examples/           # See strategies in action
├── utilities/
│   └── iceberg_utils.py    # Azure auth fix + conversion logic
└── data/                   # Sample data

What Happens When You Run It

dbt creates Parquet files in Azure at /temp/dbt/
Script finds all external tables from dbt's manifest
Each Parquet file gets converted to an Iceberg table
Temp files get deleted (unless something fails)

Your data ends up in:

/raw/ - Seed data from dbt
/temp/dbt/ - Temporary Parquet files (auto-cleaned)
/iceberg-warehouse/ - Final Iceberg tables

The Azure Auth Fix

PyIceberg has a bug where service principal auth doesn't work for writes. We built a custom FileIO class that fixes it:

# This is in utilities/iceberg_utils.py
class AzureServicePrincipalFileIO(FsspecFileIO):
    # Properly passes Azure creds to the filesystem

No more "authentication failed" errors when creating tables.

Example Models

Check out the models/examples/ folder to see all three strategies in action:

dim_customers_latest.sql - Replace strategy
dim_products_scd.sql - Merge with history tracking
fact_daily_sales.sql - Append with partitioning

Troubleshooting

Auth errors? Make sure your service principal has "Storage Blob Data Contributor" role.

Container not found? Create it first. The script won't create containers for you.

Weird Python exit errors? Those are from the adlfs library. They're harmless and happen after all the work is done. On Windows, hide them with:

uv run python dbt_iceberg.py run 2>&1 | findstr /V "TypeError.*await"

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
macros		macros
models		models
tests		tests
utilities		utilities
.env.example		.env.example
.gitignore		.gitignore
.user.yml		.user.yml
LICENSE		LICENSE
README.md		README.md
dbt_iceberg.py		dbt_iceberg.py
dbt_project.yml		dbt_project.yml
profiles.yml		profiles.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dbt-duckdb + Apache Iceberg on Azure

What This Does

Setup

Prerequisites

Install

Configure

Run

How Tables Get Loaded

1. Replace (Default)

2. Merge (Upsert)

3. Append

Full Refresh

Project Structure

What Happens When You Run It

The Azure Auth Fix

Example Models

Troubleshooting

License

About

Uh oh!

Releases

Packages

Languages

License

jordandakota/dbt-duckdb-iceberg-azure

Folders and files

Latest commit

History

Repository files navigation

dbt-duckdb + Apache Iceberg on Azure

What This Does

Setup

Prerequisites

Install

Configure

Run

How Tables Get Loaded

1. Replace (Default)

2. Merge (Upsert)

3. Append

Full Refresh

Project Structure

What Happens When You Run It

The Azure Auth Fix

Example Models

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages