Skip to content

feat(fill): implement address tag resolution for static test fillers #1781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

fselmo
Copy link
Collaborator

@fselmo fselmo commented Jun 20, 2025

🗒️ Description

Hard-coded address conversion in .yml and .json static fillers:

  • Add convert_addresses.py script to automate tag conversion
  • The correct way to run this is with the CONVERT_COINBASE flag set to False as this allows the same coinbase for all tests (just as python tests do). If we decide we want to handle the coinbase setting on the python side, we can turn this flag on and hard-code on the python side... but the current approach seems correct.
  • Convert static test YAML/JSON files to use address tags
  • Use fine tuning for certain tests. Selective script options:
    • DO_NOT_TAG_ADDRESSES: Don't tag certain addresses crucial for some tests but only for addresses that were made sure to not appear in any other test.
    • SHORT_NAME_FILLERS: This finds and replaces short name for addresses (e.g. 0x0000...000dead1 -> 0xdead1). This turns on a few more fillers that were not working.
      NO_TAGS_IN_CODE: Don't tag matches inside bytecode sections for these tests: data, code, storage.

Generate deterministic addresses from tags coming from static test fillers in python code:

  • Resolve tags to deterministic addresses in the same way python tests do - via pytest static filler plugin
  • Add BlockchainEngineXFixture support for pre-allocation groups

This enables static tests to use symbolic address tags instead of hardcoded addresses, minimizing muddied context across tests when running via pre-alloc sharing.


Note: This is a very large change with very many nuances. This is a first pass at a generalized approach. I think maybe we can add complexity and attempt at turning on more tests eventually.


🔗 Related Issues

related to #1750

✅ Checklist

  • All: Set appropriate labels for the changes.
  • All: Considered squashing commits to improve commit history.
  • All: Added an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • Tests: All converted JSON/YML tests from ethereum/tests/tests/static have been assigned @ported_from marker.
  • Tests: A PR with removal of converted JSON/YML blockchain tests from ethereum/tests have been opened.
  • Tests: Included the type and version of evm t8n tool used to locally execute test cases: e.g., ref with commit hash or geth 1.13.1-stable-3f40e65.
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.

@fselmo fselmo changed the title feat(static): implement address tag resolution for static test fillers feat(fill): implement address tag resolution for static test fillers Jun 21, 2025
Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited about this! Just a few minor comments after a fleeting review. I can try it out and have a closer look tomorrow!

Comment on lines +210 to +231
# Apply fixture format filtering for pre-allocation groups
generate_pre_alloc = self.config.getoption("generate_pre_alloc_groups")
use_pre_alloc = self.config.getoption("use_pre_alloc_groups")
if generate_pre_alloc or use_pre_alloc:
# When pre-allocation group flags are set, only generate
# BlockchainEngineXFixture
filtered_formats = [
format_item
for format_item in fixture_formats
if (
format_item is BlockchainEngineXFixture
or (
isinstance(format_item, LabeledFixtureFormat)
and format_item.format is BlockchainEngineXFixture
)
)
]
else:
# Filter out BlockchainEngineXFixture if pre-allocation group
# flags not set
filtered_formats = [
format_item
for format_item in fixture_formats
if not (
format_item is BlockchainEngineXFixture
or (
isinstance(format_item, LabeledFixtureFormat)
and format_item.format is BlockchainEngineXFixture
)
)
]

for format_with_or_without_label in filtered_formats:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was copied verbatim from filler.py. Perhaps we can add a helper function, but perhaps it's not worth it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into DRYing this up before we're ready to merge 👍🏼

@fselmo fselmo force-pushed the var-mapping-static-tests branch 3 times, most recently from 77fbcde to b996b05 Compare June 30, 2025 11:27
@fselmo fselmo force-pushed the var-mapping-static-tests branch 3 times, most recently from 5a359c9 to 0af7ed8 Compare July 3, 2025 11:45
@fselmo
Copy link
Collaborator Author

fselmo commented Jul 3, 2025

I've managed to fine tune quite a few fillers that were not passing the tagging system. There are still edge cases I think can be cleaned up and I think some of the stTransactionTest cases I can still fine tune. All tests are passing if ran with -k "not stTransactionTest" as I still think I can turn more of those on but this is ready for another set of eyes. Commits will be cleaned up as well. These are the stats on tagged vs not tagged, excluding /stTransactionTest/ (so for all passing tests):

Total filler files in tests/static (excluding stTransactionTest): 2,444
   Successfully tagged files: 1,751 (71.6%)
   Untagged files: 693 (28.4%)

My hope is there are more tests that can be turned on as well for tagging but examining all the tests and figuring out general approaches was my first attempt and I wanted to get this to a good first pass since it has taken up a lot of dev time.

The list of currently incompatible tests is here. This could be converted to a list in the documentation somewhere once it's finalized for this first pass.

fselmo added a commit to fselmo/execution-spec-tests that referenced this pull request Jul 4, 2025
- Use Prague.precompiles() for precompile addresses in convert_addresses.py
fselmo added a commit to fselmo/execution-spec-tests that referenced this pull request Jul 4, 2025
…agging.

- Wrap up stTransaction tests fine-tuning
- [fine-tuning] Replace 0 address in CALL code if in pre
- Changes from comments on PR ethereum#1781:
  - Use Prague.precompiles() for precompile addresses in convert_addresses.py
- Revamp script, simplify Claude code over-engineering.
- fine tune by not tagging some addrs, add short name tag compat for selected tests
@fselmo fselmo force-pushed the var-mapping-static-tests branch from 0f8ad98 to a2c361d Compare July 4, 2025 13:21
@fselmo
Copy link
Collaborator Author

fselmo commented Jul 4, 2025

OK, I made my last pass through the stTransactionTest tests and turned off all the ones that are incompatible. This is good to review for a first pass.

@fselmo fselmo marked this pull request as ready for review July 4, 2025 13:30
fselmo added a commit to fselmo/execution-spec-tests that referenced this pull request Jul 8, 2025
…agging.

- Wrap up stTransaction tests fine-tuning
- [fine-tuning] Replace 0 address in CALL code if in pre
- Changes from comments on PR ethereum#1781:
  - Use Prague.precompiles() for precompile addresses in convert_addresses.py
- Revamp script, simplify Claude code over-engineering.
- fine tune by not tagging some addrs, add short name tag compat for selected tests
@fselmo fselmo force-pushed the var-mapping-static-tests branch 4 times, most recently from 572ed90 to 9f56ecf Compare July 9, 2025 19:55
fselmo added a commit to fselmo/execution-spec-tests that referenced this pull request Jul 9, 2025
…ts for tagging.

- Wrap up stTransaction tests fine-tuning
- [fine-tuning] Replace 0 address in CALL code if in pre
- Changes from comments on PR ethereum#1781:
  - Use Prague.precompiles() for precompile addresses in convert_addresses.py
- Revamp script, simplify Claude code over-engineering.
- fine tune by not tagging some addrs, add short name tag compat for selected tests
@fselmo fselmo force-pushed the var-mapping-static-tests branch 4 times, most recently from 57a0f2f to d165e53 Compare July 9, 2025 20:15
fselmo added a commit to fselmo/execution-spec-tests that referenced this pull request Jul 9, 2025
…ts for tagging.

- Wrap up stTransaction tests fine-tuning
- [fine-tuning] Replace 0 address in CALL code if in pre
- Changes from comments on PR ethereum#1781:
  - Use Prague.precompiles() for precompile addresses in convert_addresses.py
- Revamp script, simplify Claude code over-engineering.
- fine tune by not tagging some addrs, add short name tag compat for selected tests
fselmo added a commit to fselmo/execution-spec-tests that referenced this pull request Jul 9, 2025
Hard-coded address conversion in yml and json fillers:
  - Add convert_addresses.py script to automate tag conversion
  - The correct way to run this is with the ``CONVERT_COINBASE`` flag set to ``False``
    as this allows the same coinbase for all tests (just as python tests do). If we
    decide we want to handle the coinbase setting on the python side, we can turn this
    flag on and hard-code on the python side... but the currect approach seems correct.
  - Convert 1000+ static test YAML/JSON files to use address tags

(Python) Generate deterministic addresses from tags coming from static test fillers:
  - Resolve tags to deterministic addresses in the same way python tests do - via pytest static filler plugin
  - Add ``BlockchainEngineXFixture`` support for pre-allocation groups

This enables static tests to use symbolic address tags instead of hardcoded addresses,
minimizing muddied context across tests when running via pre alloc sharing.

----

fix(tests/static): Fine tune addr tag script, turn on more static tests for tagging.

- Wrap up stTransaction tests fine-tuning
- [fine-tuning] Replace 0 address in CALL code if in pre
- Changes from comments on PR ethereum#1781:
  - Use Prague.precompiles() for precompile addresses in convert_addresses.py
- Revamp script, simplify Claude code over-engineering.
- fine tune by not tagging some addrs, add short name tag compat for selected tests
@fselmo fselmo force-pushed the var-mapping-static-tests branch from c8d33ec to 1736983 Compare July 9, 2025 20:23
fselmo added a commit to fselmo/execution-spec-tests that referenced this pull request Jul 9, 2025
Hard-coded address conversion in yml and json fillers:
  - Add convert_addresses.py script to automate tag conversion
  - The correct way to run this is with the ``CONVERT_COINBASE`` flag set to ``False``
    as this allows the same coinbase for all tests (just as python tests do). If we
    decide we want to handle the coinbase setting on the python side, we can turn this
    flag on and hard-code on the python side... but the currect approach seems correct.
  - Convert 1000+ static test YAML/JSON files to use address tags

(Python) Generate deterministic addresses from tags coming from static test fillers:
  - Resolve tags to deterministic addresses in the same way python tests do - via pytest static filler plugin
  - Add ``BlockchainEngineXFixture`` support for pre-allocation groups

This enables static tests to use symbolic address tags instead of hardcoded addresses,
minimizing muddied context across tests when running via pre alloc sharing.

----

fix(tests/static): Fine tune addr tag script, turn on more static tests for tagging.

- Wrap up stTransaction tests fine-tuning
- [fine-tuning] Replace 0 address in CALL code if in pre
- Changes from comments on PR ethereum#1781:
  - Use Prague.precompiles() for precompile addresses in convert_addresses.py
- Revamp script, simplify Claude code over-engineering.
- fine tune by not tagging some addrs, add short name tag compat for selected tests
@fselmo fselmo force-pushed the var-mapping-static-tests branch from 1736983 to a39720e Compare July 9, 2025 20:27
@fselmo
Copy link
Collaborator Author

fselmo commented Jul 10, 2025

Hmm it looks like running with -m "not slow and tagged" --generate-pre-alloc-groups -n 20 has a few failures, a handful of them being read timeouts and I'm not sure if it's a domino effect or what is happening there. Will need to investigate.


edit: I actually need to re-investigate the enginex flow altogether after the pydantic changes. It doesn't seem to be working as expected.

@fselmo fselmo force-pushed the var-mapping-static-tests branch from 4ec6097 to 87eb659 Compare July 10, 2025 20:58
@fselmo
Copy link
Collaborator Author

fselmo commented Jul 10, 2025

Ok @danceratopz @marioevz, there are a few things to consider here.

  • The commit directly after Mario's fixes the issues post-pydantic refactor. This had to turn off some more tests but this was expected on my end as they related mostly to create tests (shouldnotexist in post state) and I had to account for those pre-pydantic refactor still.
  • The commit after that is for using a pytest plugin to be able to use -m "tagged" or -m "untagged" markers for collecting tagged static tests. This has been very useful for filling with --generate-pre-alloc-groups.
  • The very last commit is for hashing the pre and post states of a coinbase address and resolving its tag to based on this hash so that tests like stRandom where the coinbase has set code in the pre are able to use the same shared pre alloc. Before this change, something like uv run fill --fill-static-tests --clean --generate-pre-alloc-groups tests/static -m "not slow and tagged" -k stRandom -n 20 would create something like 501 groups for 503 tests. After this change it goes to 11 groups for 503 tests.

As of now, even before the last commit for the shared coinbase logic, I am getting timeout failures when running the tests with the shared pre alloc: uv run fill --fill-static-tests --clean --generate-pre-alloc-groups -m "not slow and tagged" tests/static -n 20. This is something to consider when reviewing but otherwise the tests fill if no shared pre flag is passed. I'm not entirely sure where these failures are coming from so any clarity would be good here. Thanks!

@fselmo fselmo self-assigned this Jul 11, 2025
@fselmo fselmo added type:feat type: Feature scope:fill Scope: fill command labels Jul 11, 2025
@fselmo fselmo force-pushed the var-mapping-static-tests branch from 87eb659 to 371032a Compare July 15, 2025 19:40
fselmo and others added 4 commits July 15, 2025 13:40
Hard-coded address conversion in yml and json fillers:
  - Add convert_addresses.py script to automate tag conversion
  - The correct way to run this is with the ``CONVERT_COINBASE`` flag set to ``False``
    as this allows the same coinbase for all tests (just as python tests do). If we
    decide we want to handle the coinbase setting on the python side, we can turn this
    flag on and hard-code on the python side... but the currect approach seems correct.
  - Convert 1000+ static test YAML/JSON files to use address tags

(Python) Generate deterministic addresses from tags coming from static test fillers:
  - Resolve tags to deterministic addresses in the same way python tests do - via pytest static filler plugin
  - Add ``BlockchainEngineXFixture`` support for pre-allocation groups

This enables static tests to use symbolic address tags instead of hardcoded addresses,
minimizing muddied context across tests when running via pre alloc sharing.

----

fix(tests/static): Fine tune addr tag script, turn on more static tests for tagging.

- Wrap up stTransaction tests fine-tuning
- [fine-tuning] Replace 0 address in CALL code if in pre
- Changes from comments on PR ethereum#1781:
  - Use Prague.precompiles() for precompile addresses in convert_addresses.py
- Revamp script, simplify Claude code over-engineering.
- fine tune by not tagging some addrs, add short name tag compat for selected tests
* All pydantic simplifications
* refactor(tests/static): rename sender:key -> eoa:sender
* refactor: rename, use generics
* fix: consider empty accounts
* fix(tests): tests with empty accounts
* fix(tests): addressOpcodesFiller.yml
* feat: significantly improve test ids
* fix: bugs in tag resolution
* fix(tests): CREATE2_HighNonceDelegatecallFiller.yml
* fix: types
* Update src/ethereum_test_specs/static_state/account.py

---

Co-authored-by: felipe <[email protected]>

* fix: comment and generic tag regex
* fix: Code raw code tag substitution
fix: fix when tabs are found in lines with spaces; fix shortnames
fix: Resolve issues with label: / raw: parsing
Turn off more tests, mostly related to create / `creation` addresses
- Add marker for tagged + untagged tests to make them easier to identify
@fselmo fselmo force-pushed the var-mapping-static-tests branch from 371032a to a9842ec Compare July 15, 2025 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope:fill Scope: fill command type:feat type: Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants