Skip to content

Conversation

@0xBoxer
Copy link
Collaborator

@0xBoxer 0xBoxer commented Jan 9, 2026

Thank you for contributing to Spellbook 🪄

Please open the PR in draft and mark as ready when you want to request a review.

Description:

Addresses PLA-1561 by implementing a custom function to extract EIP-8021 builder codes from transaction calldata. This fulfills an urgent request from the Base Data Science Team, enabling them to parse builder identification labels embedded in transactions.

The solution provides:

  • A dbt macro call_data_8021(calldata_field) that parses VARBINARY calldata and returns a structured ROW with original transaction data, schema type, hex codes, readable codes, custom registry address (if applicable), codes array, and the EIP-8021 suffix.
  • A helper macro has_eip_8021_suffix(calldata_field) to easily filter for relevant transactions.
  • Comprehensive tests and an example usage.

This implementation leverages native Trino VARBINARY functions for optimal performance.


quick links for more information:


Linear Issue: PLA-1561

Open in Cursor Open in Web

@cursor
Copy link

cursor bot commented Jan 9, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@github-actions github-actions bot added WIP work in progress dbt: daily covers the Daily dbt subproject labels Jan 9, 2026
cursoragent and others added 8 commits January 9, 2026 16:51
- Remove newlines before CASE and after END in call_data_8021 macro to fix Trino syntax error
- Use CAST(NULL AS varbinary) for explicit null typing in tests
- Use IS NOT DISTINCT FROM for null-safe comparison in registry address test
…o output

Use {%- and -%} syntax to strip whitespace before/after macro tags,
preventing the CASE statement from appearing on a new line in the
compiled SQL.
…g whitespace

Put CASE directly after %} without any newline to ensure no whitespace
is output before the SQL content.
The Jinja macro reference inside the SQL comment was being expanded
by dbt, breaking the compiled SQL. Removed the curly braces from
the comment to prevent expansion.
…eger

DuneSQL/Trino has bytearray_to_bigint function, not bytearray_to_integer.
This was causing the macro to fail when extracting byte values for
schema_id and codes_length.
Simplified the tests to only test the main example from the issue.
This helps isolate whether the core logic works before testing
additional edge cases.
The test calldata had an extra 6 bytes (802180218021) in the magic suffix,
causing the macro to read incorrect positions for schema_id and codes_length.

Correct structure: 36 bytes original + 9 bytes codes + 1 byte length + 1 byte schema + 16 bytes magic = 63 bytes
@0xBoxer
Copy link
Collaborator Author

0xBoxer commented Jan 12, 2026

@cursoragent I would like to invoke this outside of this dbt repo in the usual sql environment, is that possible somehow?

@cursor
Copy link

cursor bot commented Jan 12, 2026

Yes, absolutely! The dbt macro is essentially a wrapper around pure Trino SQL. Here's the standalone SQL version you can use directly in Dune Analytics or any Trino SQL environment:

Standalone EIP-8021 Parser

Replace YOUR_CALLDATA_FIELD with your actual calldata column (e.g., data from a transactions table):

-- EIP-8021 Calldata Parser (Standalone SQL)
-- Replace YOUR_CALLDATA_FIELD with your calldata column (e.g., txns.data)

SELECT 
    tx_hash,
    CASE
        -- Only process calldata that ends with the EIP-8021 magic bytes
        WHEN bytearray_substring(data, bytearray_length(data) - 15, 16) = 0x80218021802180218021802180218021
        THEN CAST(
            ROW(
                -- 1) original_tx_data: Extract original calldata (before EIP-8021 suffix)
                bytearray_substring(
                    data,
                    1,
                    bytearray_length(data)
                    - 16  -- magic bytes (16 bytes)
                    - 1   -- schema_id (1 byte)
                    - 1   -- codes_length (1 byte)
                    - CASE 
                        WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 1 
                        THEN 20  -- registry address for Schema 1 (20 bytes)
                        ELSE 0 
                      END
                    - bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
                ),

                -- 2) schema_type: Decode schema ID
                CASE
                    WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 0 
                        THEN 'Schema 0: Canonical Registry'
                    WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 1 
                        THEN 'Schema 1: Custom Registry'
                    ELSE 'Unknown Schema'
                END,

                -- 3) codes_hex: Raw hex codes
                to_hex(
                    bytearray_substring(
                        data,
                        bytearray_length(data)
                            - 16 - 1 - 1
                            - CASE 
                                WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 1 
                                THEN 20 ELSE 0 
                              END
                            - bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
                            + 1,
                        bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
                    )
                ),

                -- 4) custom_registry_address: Registry address (Schema 1 only)
                CASE
                    WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 1 
                    THEN bytearray_substring(data, bytearray_length(data) - 37, 20)
                    ELSE NULL
                END,

                -- 5) codes_readable: Human-readable codes (UTF-8 decoded)
                try(from_utf8(
                    bytearray_substring(
                        data,
                        bytearray_length(data)
                            - 16 - 1 - 1
                            - CASE 
                                WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 1 
                                THEN 20 ELSE 0 
                              END
                            - bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
                            + 1,
                        bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
                    )
                )),

                -- 6) codes_array: Codes split by comma
                split(
                    try(from_utf8(
                        bytearray_substring(
                            data,
                            bytearray_length(data)
                                - 16 - 1 - 1
                                - CASE 
                                    WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 1 
                                    THEN 20 ELSE 0 
                                  END
                                - bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
                                + 1,
                            bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
                        )
                    )),
                    ','
                ),

                -- 7) erc_8021_suffix: The magic bytes
                bytearray_substring(data, bytearray_length(data) - 15, 16)
            )
            AS ROW(
                original_tx_data varbinary,
                schema_type varchar,
                codes_hex varchar,
                custom_registry_address varbinary,
                codes_readable varchar,
                codes_array array(varchar),
                erc_8021_suffix varbinary
            )
        )
        ELSE NULL
    END AS eip8021_data
FROM base.transactions
WHERE bytearray_substring(data, bytearray_length(data) - 15, 16) = 0x80218021802180218021802180218021
LIMIT 100

Quick Filter (to find EIP-8021 transactions)

-- Filter for transactions with EIP-8021 suffix
SELECT *
FROM base.transactions
WHERE bytearray_substring(data, bytearray_length(data) - 15, 16) = 0x80218021802180218021802180218021

Access Parsed Fields

Once you have the eip8021_data ROW, access fields like:

eip8021_data.codes_readable     -- e.g., 'cb_wallet'
eip8021_data.codes_array        -- e.g., ['cb', 'wallet'] if comma-separated
eip8021_data.schema_type        -- 'Schema 0: Canonical Registry' or 'Schema 1: Custom Registry'
eip8021_data.original_tx_data   -- Original calldata without the EIP-8021 suffix
eip8021_data.custom_registry_address  -- Registry address (Schema 1 only)

The magic bytes 0x80218021802180218021802180218021 are the 16-byte EIP-8021 identifier that marks the end of compliant calldata.


Open in Cursor Open in Web

Learn more about Cursor Agents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dbt: daily covers the Daily dbt subproject WIP work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants