-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Eip-8021 calldata parser #9196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Eip-8021 calldata parser #9196
Conversation
Co-authored-by: florian <[email protected]>
|
Cursor Agent can help with this pull request. Just |
- Remove newlines before CASE and after END in call_data_8021 macro to fix Trino syntax error - Use CAST(NULL AS varbinary) for explicit null typing in tests - Use IS NOT DISTINCT FROM for null-safe comparison in registry address test
…o output
Use {%- and -%} syntax to strip whitespace before/after macro tags,
preventing the CASE statement from appearing on a new line in the
compiled SQL.
…g whitespace Put CASE directly after %} without any newline to ensure no whitespace is output before the SQL content.
The Jinja macro reference inside the SQL comment was being expanded by dbt, breaking the compiled SQL. Removed the curly braces from the comment to prevent expansion.
…eger DuneSQL/Trino has bytearray_to_bigint function, not bytearray_to_integer. This was causing the macro to fail when extracting byte values for schema_id and codes_length.
Simplified the tests to only test the main example from the issue. This helps isolate whether the core logic works before testing additional edge cases.
The test calldata had an extra 6 bytes (802180218021) in the magic suffix, causing the macro to read incorrect positions for schema_id and codes_length. Correct structure: 36 bytes original + 9 bytes codes + 1 byte length + 1 byte schema + 16 bytes magic = 63 bytes
|
@cursoragent I would like to invoke this outside of this dbt repo in the usual sql environment, is that possible somehow? |
|
Yes, absolutely! The dbt macro is essentially a wrapper around pure Trino SQL. Here's the standalone SQL version you can use directly in Dune Analytics or any Trino SQL environment: Standalone EIP-8021 ParserReplace -- EIP-8021 Calldata Parser (Standalone SQL)
-- Replace YOUR_CALLDATA_FIELD with your calldata column (e.g., txns.data)
SELECT
tx_hash,
CASE
-- Only process calldata that ends with the EIP-8021 magic bytes
WHEN bytearray_substring(data, bytearray_length(data) - 15, 16) = 0x80218021802180218021802180218021
THEN CAST(
ROW(
-- 1) original_tx_data: Extract original calldata (before EIP-8021 suffix)
bytearray_substring(
data,
1,
bytearray_length(data)
- 16 -- magic bytes (16 bytes)
- 1 -- schema_id (1 byte)
- 1 -- codes_length (1 byte)
- CASE
WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 1
THEN 20 -- registry address for Schema 1 (20 bytes)
ELSE 0
END
- bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
),
-- 2) schema_type: Decode schema ID
CASE
WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 0
THEN 'Schema 0: Canonical Registry'
WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 1
THEN 'Schema 1: Custom Registry'
ELSE 'Unknown Schema'
END,
-- 3) codes_hex: Raw hex codes
to_hex(
bytearray_substring(
data,
bytearray_length(data)
- 16 - 1 - 1
- CASE
WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 1
THEN 20 ELSE 0
END
- bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
+ 1,
bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
)
),
-- 4) custom_registry_address: Registry address (Schema 1 only)
CASE
WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 1
THEN bytearray_substring(data, bytearray_length(data) - 37, 20)
ELSE NULL
END,
-- 5) codes_readable: Human-readable codes (UTF-8 decoded)
try(from_utf8(
bytearray_substring(
data,
bytearray_length(data)
- 16 - 1 - 1
- CASE
WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 1
THEN 20 ELSE 0
END
- bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
+ 1,
bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
)
)),
-- 6) codes_array: Codes split by comma
split(
try(from_utf8(
bytearray_substring(
data,
bytearray_length(data)
- 16 - 1 - 1
- CASE
WHEN bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 16, 1)) = 1
THEN 20 ELSE 0
END
- bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
+ 1,
bytearray_to_bigint(bytearray_substring(data, bytearray_length(data) - 17, 1))
)
)),
','
),
-- 7) erc_8021_suffix: The magic bytes
bytearray_substring(data, bytearray_length(data) - 15, 16)
)
AS ROW(
original_tx_data varbinary,
schema_type varchar,
codes_hex varchar,
custom_registry_address varbinary,
codes_readable varchar,
codes_array array(varchar),
erc_8021_suffix varbinary
)
)
ELSE NULL
END AS eip8021_data
FROM base.transactions
WHERE bytearray_substring(data, bytearray_length(data) - 15, 16) = 0x80218021802180218021802180218021
LIMIT 100Quick Filter (to find EIP-8021 transactions)-- Filter for transactions with EIP-8021 suffix
SELECT *
FROM base.transactions
WHERE bytearray_substring(data, bytearray_length(data) - 15, 16) = 0x80218021802180218021802180218021Access Parsed FieldsOnce you have the eip8021_data.codes_readable -- e.g., 'cb_wallet'
eip8021_data.codes_array -- e.g., ['cb', 'wallet'] if comma-separated
eip8021_data.schema_type -- 'Schema 0: Canonical Registry' or 'Schema 1: Custom Registry'
eip8021_data.original_tx_data -- Original calldata without the EIP-8021 suffix
eip8021_data.custom_registry_address -- Registry address (Schema 1 only)The magic bytes Learn more about Cursor Agents |
Thank you for contributing to Spellbook 🪄
Please open the PR in draft and mark as ready when you want to request a review.
Description:
Addresses PLA-1561 by implementing a custom function to extract EIP-8021 builder codes from transaction calldata. This fulfills an urgent request from the Base Data Science Team, enabling them to parse builder identification labels embedded in transactions.
The solution provides:
call_data_8021(calldata_field)that parsesVARBINARYcalldata and returns a structuredROWwith original transaction data, schema type, hex codes, readable codes, custom registry address (if applicable), codes array, and the EIP-8021 suffix.has_eip_8021_suffix(calldata_field)to easily filter for relevant transactions.This implementation leverages native Trino
VARBINARYfunctions for optimal performance.quick links for more information:
Linear Issue: PLA-1561