-
Notifications
You must be signed in to change notification settings - Fork 1.4k
CUR2-1059: extend stablecoin.balances (part 2) - base+enrich approach
#9103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
temp excluded arbitrum, bnb, ethereum, optimism and polygon to allow ci tests to complete |
|
high level dq checks
|
PR SummaryRearchitects stablecoin balances into per-chain seed/latest base+enrich models with union views, migrates token lists, adds a cross-chain balances view, updates the balances macro to exclude today, and removes legacy models.
Written by Cursor Bugbot for commit dea7788. Configure here. |
0xRobin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Structure looks good! Only remark is that we should not use latest and seed in the model name when we are not referring to a model that holds the latest balances (not full history) or models that are seeds. 😅
Some alternatives:
seed
- archive
- core
- canonical
latest
- extended
- dynamic
But open to other options, as long as we can move away from "latest" and "seed" which are already overloaded terms.
|
yep,
or maybe one of these: |
let's do _core / _extended |
|
updated
|
jeff-dude
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stopping review on arbitrum, assuming other chains are similar
| config( | ||
| schema = 'tokens_' ~ chain, | ||
| alias = 'erc20_stablecoins_core', | ||
| tags = ['static'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this defaults to view type, we should likely use table in all model configs for static token lists
historically we have found hardcoded lists like this in a view when joined to large datasets can cause trouble on the trino query planner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only commenting on arbitrum as top of list, apply to all chains
| config( | ||
| schema = 'tokens_' ~ chain, | ||
| alias = 'erc20_stablecoins', | ||
| tags = ['static'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this union can likely stay a view, since underlying will be materialized into tables
| {% set chain = 'arbitrum' %} | ||
|
|
||
| {{ | ||
| config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we consider partitions on these tables?
how big are they?
worth looking into the size of each day and consider a partition on that column. if day is too small, then month may make sense.
| {% set chain = 'arbitrum' %} | ||
|
|
||
| {{ | ||
| config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question here on partitions
| materialized = 'incremental', | ||
| file_format = 'delta', | ||
| incremental_strategy = 'merge', | ||
| partition_by = ['day'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see we partition by day here, may consider same on base level
| @@ -0,0 +1,20 @@ | |||
| {% set chain = 'blast' %} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fairly certain blast is deprecated / data is frozen at a point in time?
can you confirm with max(block_time) on the chain?
if so, may make sense to leave out of PR
Stablecoins Balance Models Refactoring
Problem
Adding new stablecoins to the token list triggers full historical recalculation of incremental balance models, which is computationally expensive.
Solution
Split token lists and balance models into seed (frozen historical) and latest (new additions) streams, with separate enrichment pipelines.
New Architecture
Key Changes
Token Lists:
_core.sql- frozen list of existing stablecoins_extended.sql- empty template for new additionsBalance Models:
_core_balances- incremental, sources from seed list, original start_date_extended_balances- incremental, sources from latest list, start_date =2025-01-01_core_balances_enriched- incremental, applies enrichment macro_extended_balances_enriched- incremental, applies enrichment macro_balances- view unions both enriched modelsAdding New Stablecoins
tokens_<chain>_erc20_stablecoins_extended.sqlstart_dateinstablecoins_<chain>_extended_balances.sqllatest_balancesandlatest_balances_enrichedonlySeed models remain untouched.
Chains