Skip to content

Conversation

@tomfutago
Copy link
Contributor

@tomfutago tomfutago commented Dec 9, 2025

Stablecoins Balance Models Refactoring

Problem

Adding new stablecoins to the token list triggers full historical recalculation of incremental balance models, which is computationally expensive.

Solution

Split token lists and balance models into seed (frozen historical) and latest (new additions) streams, with separate enrichment pipelines.

New Architecture

tokens_<chain>_erc20_stablecoins_core     →  stablecoins_<chain>_core_balances     →  stablecoins_<chain>_core_balances_enriched
                                                                                                                              ↘
tokens_<chain>_erc20_stablecoins (view)                                                              stablecoins_<chain>_balances (view)
                                                                                                                              ↗
tokens_<chain>_erc20_stablecoins_extended   →  stablecoins_<chain>_extended_balances   →  stablecoins_<chain>_extended_balances_enriched

Key Changes

Token Lists:

  • _core.sql - frozen list of existing stablecoins
  • _extended.sql - empty template for new additions
  • Union view combines both

Balance Models:

  • _core_balances - incremental, sources from seed list, original start_date
  • _extended_balances - incremental, sources from latest list, start_date = 2025-01-01
  • _core_balances_enriched - incremental, applies enrichment macro
  • _extended_balances_enriched - incremental, applies enrichment macro
  • _balances - view unions both enriched models

Adding New Stablecoins

  1. Add token to tokens_<chain>_erc20_stablecoins_extended.sql
  2. Update start_date in stablecoins_<chain>_extended_balances.sql
  3. Full refresh latest_balances and latest_balances_enriched only

Seed models remain untouched.

Chains

  • With balances (11): arbitrum, avalanche_c, base, bnb, ethereum, kaia, linea, optimism, polygon, scroll, worldchain
  • Token lists only (6): blast, bob, celo, fantom, gnosis, mantle

@github-actions github-actions bot added WIP work in progress dbt: daily covers the Daily dbt subproject dbt: tokens covers the Tokens dbt subproject labels Dec 9, 2025
@github-actions github-actions bot removed the dbt: daily covers the Daily dbt subproject label Dec 10, 2025
@github-actions github-actions bot added the dbt: daily covers the Daily dbt subproject label Dec 10, 2025
@tomfutago
Copy link
Contributor Author

temp excluded arbitrum, bnb, ethereum, optimism and polygon to allow ci tests to complete

@tomfutago
Copy link
Contributor Author

high level dq checks

  1. token coverage -> OK:
-- Check 1: Token coverage
with old_tokens as (
  select distinct token_address, token_symbol
  from stablecoins_linea.balances
),
new_tokens as (
  select distinct token_address, token_symbol
  from test_schema.git_dunesql_04661fe_stablecoins_linea_balances
)
select
  coalesce(o.token_address, n.token_address) as token_address,
  o.token_symbol as old_symbol,
  n.token_symbol as new_symbol,
  case 
    when o.token_address is null then 'EXTRA_IN_NEW'
    when n.token_address is null then 'MISSING_IN_NEW'
    when o.token_symbol != n.token_symbol then 'SYMBOL_MISMATCH'
    else 'OK'
  end as status
from old_tokens o
full outer join new_tokens n on o.token_address = n.token_address
order by status, token_address
  1. spot checks -> OK (expected diffs for mid-day balances in old model)
select
  coalesce(o.day, n.day) as day,
  coalesce(o.token_address, n.token_address) as token_address,
  o.balance as old_balance, n.balance as new_balance,
  coalesce(o.balance, 0) - coalesce(n.balance, 0) as diff
from stablecoins_linea.balances o
full outer join test_schema.git_dunesql_04661fe_stablecoins_linea_balances n
  on o.day = n.day
  and o.address = n.address
  and o.token_address = n.token_address
where coalesce(o.address, n.address) = 0x795facaa76aed7c5f44a053155407199f4075139
order by 1

@tomfutago tomfutago marked this pull request as ready for review December 17, 2025 11:05
@cursor
Copy link

cursor bot commented Dec 17, 2025

PR Summary

Rearchitects stablecoin balances into per-chain seed/latest base+enrich models with union views, migrates token lists, adds a cross-chain balances view, updates the balances macro to exclude today, and removes legacy models.

  • Stablecoins architecture (base + enrich):
    • Split balances into seed (frozen) and latest (new) incremental streams per chain; enrich in separate models; union into stablecoins_<chain>_balances views.
  • Per-chain implementations (11 chains):
    • Added models and schemas for arbitrum, avalanche_c, base, bnb, ethereum, kaia, linea, optimism, polygon, scroll, worldchain:
      • tokens_<chain>_erc20_stablecoins_seed/latest (+ union view), stablecoins_<chain>_seed_balances/_latest_balances, and corresponding _enriched + balances view.
  • Cross-chain view:
    • New stablecoins.balances view unioning enriched balances across chains.
  • Macro update:
    • balances_incremental_subset_daily: exclude current_date in days to avoid mid-day stale data.
  • Cleanup/Migration:
    • Removed legacy daily_spellbook stablecoin balance models and old tokens_*_erc20_stablecoins files/schemas; migrated to new tokens subproject structure.

Written by Cursor Bugbot for commit dea7788. Configure here.

@github-actions github-actions bot added ready-for-review this PR development is complete, please review and removed WIP work in progress labels Dec 17, 2025
Copy link
Collaborator

@0xRobin 0xRobin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structure looks good! Only remark is that we should not use latest and seed in the model name when we are not referring to a model that holds the latest balances (not full history) or models that are seeds. 😅

Some alternatives:
seed

  • archive
  • core
  • canonical

latest

  • extended
  • dynamic

But open to other options, as long as we can move away from "latest" and "seed" which are already overloaded terms.

@tomfutago
Copy link
Contributor Author

yep, _seed / _latest is probably not the best first choice..

_core / _extended sounds good

or maybe one of these:
_initial / _additions
_frozen / _new

@jeff-dude
Copy link
Member

yep, _seed / _latest is probably not the best first choice..

_core / _extended sounds good

or maybe one of these: _initial / _additions _frozen / _new

let's do _core / _extended

@tomfutago
Copy link
Contributor Author

updated tokens_<chain>_erc20_stablecoins_core:

  • list contains only contract_address + symbol in comments
  • added more tokens per each chain based on: https://dune.com/queries/6371381 (added list separated by 1 extra linebreak)

@tomfutago tomfutago requested a review from 0xRobin December 17, 2025 18:03
Copy link
Member

@jeff-dude jeff-dude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopping review on arbitrum, assuming other chains are similar

config(
schema = 'tokens_' ~ chain,
alias = 'erc20_stablecoins_core',
tags = ['static'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this defaults to view type, we should likely use table in all model configs for static token lists

historically we have found hardcoded lists like this in a view when joined to large datasets can cause trouble on the trino query planner

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only commenting on arbitrum as top of list, apply to all chains

config(
schema = 'tokens_' ~ chain,
alias = 'erc20_stablecoins',
tags = ['static'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this union can likely stay a view, since underlying will be materialized into tables

{% set chain = 'arbitrum' %}

{{
config(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider partitions on these tables?
how big are they?
worth looking into the size of each day and consider a partition on that column. if day is too small, then month may make sense.

{% set chain = 'arbitrum' %}

{{
config(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here on partitions

materialized = 'incremental',
file_format = 'delta',
incremental_strategy = 'merge',
partition_by = ['day'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see we partition by day here, may consider same on base level

@@ -0,0 +1,20 @@
{% set chain = 'blast' %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fairly certain blast is deprecated / data is frozen at a point in time?
can you confirm with max(block_time) on the chain?
if so, may make sense to leave out of PR

@jeff-dude jeff-dude added in review Assignee is currently reviewing the PR and removed ready-for-review this PR development is complete, please review labels Dec 18, 2025
@jeff-dude jeff-dude self-assigned this Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dbt: daily covers the Daily dbt subproject dbt: tokens covers the Tokens dbt subproject in review Assignee is currently reviewing the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants