Skip to content

feat(fill): add --gas-benchmark-values command to support single genesis file #1895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

LouisTsai-Csie
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie commented Jul 11, 2025

🗒️ Description

This PR introduces a new fill option, --gas-benchmark-values. Supply a comma-separated list of gas amounts (in millions) to set the values used during benchmarking.

The PR also adds two example tests in tests/benchmark/test_worst_blocks.py. To generate their fixtures, run:

uv run fill -v tests/benchmark/test_worst_blocks.py::test_block_full_data \
  --fork Prague \
  --gas-benchmark-values 1,10,30,60,90,120 \
  --generate-pre-alloc-groups \
  --clean

Flag --generate-pre-alloc-groups is required for the enginex fixture format.

The command creates two directories:

  • fixtures/blockchain_tests_engine_x/benchmark/worst_blocks
  • fixtures/blockchain_tests_engine_x/pre_alloc

Because only one preAllocGroup is produced, this process generates a single genesis file.

To generate the genesis file, please follow the documentation to run hive locally and run the extract_config command

For example: uv run extract_config --fixture fixtures/blockchain_tests_engine_x/pre_alloc/0x10763c36b27696c5.json

I would prefer to refactor the benchmark test in a separate PR, this task is updated in the issue.

I’ve reviewed the Filling Test section, and I see that the command and flag descriptions are generated by this script. However, I’m happy to contribute additional documentation if needed.

For pytest plugin test cases, I add three cases, you could run with the following command:
Case 1: Verify the --gas-benchmark-values flag is added
Case 2: Verify the flag works as expected if provided
Case 3: Verify the non-benchmark test is not affected.

python -m pytest src/pytest_plugins/filler/tests/test_benchmarking.py -v

🔗 Related Issues or PRs

Issue #1891

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

@LouisTsai-Csie LouisTsai-Csie self-assigned this Jul 11, 2025
@LouisTsai-Csie LouisTsai-Csie force-pushed the fill-benchmark-command branch from 8675c6c to d0413c8 Compare July 11, 2025 15:22
Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I think this a great change for maintainability and make it easier for us to generate more vectors as they are required.

Small downside is that we have to remove Environment().gas_limit from most of tests, but I would say we do it the earlier the better.

cc @jsign for some feedback on my comments.

Thanks!

if "gas_benchmark_value" in metafunc.fixturenames:
gas_benchmark_values = metafunc.config.getoption("gas_benchmark_value")
if gas_benchmark_values:
gas_values = [int(x) for x in gas_benchmark_values.replace(" ", "").split(",")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
gas_values = [int(x) for x in gas_benchmark_values.replace(" ", "").split(",")]
gas_values = [int(x.strip()) for x in gas_benchmark_values.split(",")]

Not sure if necessary but maybe in case there's a white space between commas?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary to handle cases like 5, 10, 20,30, string with white spaces between. But the current .replace(" ", "") method already removes the white space

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer strip(); as it's very widely used :)

Comment on lines 340 to 345
if config.getoption("gas_benchmark_value"):
benchmark_values = [
int(x) for x in config.getoption("gas_benchmark_value").replace(" ", "").split(",")
]
EnvironmentDefaults.gas_limit = max(benchmark_values) * 1_000_000

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose that we leave this out and instead create a fixture in the benchmark folder's conftest.py, that generates a default gas value for the setups:

@pytest.fixture
def env() -> Environment:  # noqa: D103
    return Environment(gas_limit=1_000_000_000)

The 1 billion default seems reasonable for most benchmark setups that deploy a ton of contracts for the actual benchmark test.

The tests would now change from Environment().gas_limit to adding the env parameter to the test function and then env.gas_limit to determine the max gas to use in the setup block, while respecting gas_benchmark_value for the attack block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, sounds good to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but this looks to be part of the general refactoring, should I do it in this PR, or I could add it in this issue and do it later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1_000_000_000 sounds good. Gigagas! 😄

But I'd prefer to avoid too much customization on the ./tests/benchmark/ level and would prefer to try and keep it in a central place.

You did give me an idea though... to create a benchmarking plugin (described in the top-level comment of this review).

In that case, the fixture that Mario suggests, goes into the benchmarking plugin and would return gigagas if in benchmarking mode and Environment() (default), if not.

@LouisTsai-Csie I think it's ok to do it in the scope of this PR.

action="store",
dest="gas_benchmark_value",
type=str,
default=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default=None,
default='5',

This would default to run the benchmark tests with 5 million gas for sanity checking for example, what do you think?

Copy link
Collaborator Author

@LouisTsai-Csie LouisTsai-Csie Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful with non-benchmark tests, if a test is not a benchmark test it should be unaffected.
According to the issue description, non-benchmark tests shouldn’t be impacted, since this would change the gas limit.

If we set the default value of --gas-benchmark-values to "5", the non-benchmark ones would automatically apply a benchmark fixture with a 5M gas setting. I am not sure if this is what we want.

Based on my experiment, setting the default to "5" doesn’t apply when the flag is explicitly included without a value, like this:

uv run fill -v tests/benchmark/test_worst_blocks.py::test_block_full_data \
  --fork Prague \
  --gas-benchmark-values\
  --generate-pre-alloc-groups \
  --clean
# This will lead to error

However, when the flag is ignored the default value is applied, as seen in this command:

uv run fill -v tests/benchmark/test_worst_blocks.py::test_block_full_data \
  --fork Prague \
  --generate-pre-alloc-groups \
  --clean
# It will have fork_Prague-blockchain_test_engine_x_from_state_test-benchmark-gas-value_5M-zero_byte_False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clear explanation @LouisTsai-Csie, I don't think we should add a default value here and leave as-is.

  • Set a value on the command-line -> benchmarking mode.
  • No value (default) -> regular consensus test mode.

@jsign
Copy link
Collaborator

jsign commented Jul 11, 2025

Nice! @LouisTsai-Csie @marioevz, is this compatible with supporting all the test formats too? (i.e. #1778).

Mostly asking since I think this is coming from the fact of simplifying the single genesis for perfnets, but wondering if it should still be fine for the other formats that we need for zkVMs.

@marioevz
Copy link
Member

Nice! @LouisTsai-Csie @marioevz, is this compatible with supporting all the test formats too? (i.e. #1778).

Mostly asking since I think this is coming from the fact of simplifying the single genesis for perfnets, but wondering if it should still be fine for the other formats that we need for zkVMs.

Should be compatible out of the box, but I'll give that a look again and raise if the there's any concerns.

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great @LouisTsai-Csie!

Shame, that this didn't occur to me up front in #1891, but I'd suggest that we move this codeto a new plugin that gets activated with fill by default. This should work well due to the composability of pytest plugins.

This means, we:

  1. Add these changes (and other benchmarking related pytest config, if any) to a separate pytest plugin, I'd suggest src/pytest_plugins/filling/benchmarking.py.
  2. Enable this plugin using -p via the fill command's pytest ini:
    addopts =
    -p pytest_plugins.concurrency
    -p pytest_plugins.filler.pre_alloc
    -p pytest_plugins.filler.filler
    -p pytest_plugins.filler.ported_tests
    -p pytest_plugins.filler.static_filler
    -p pytest_plugins.shared.execute_fill
    -p pytest_plugins.forks.forks
    -p pytest_plugins.eels_resolver
    -p pytest_plugins.help.help
    --tb short
    --ignore tests/cancun/eip4844_blobs/point_evaluation_vectors/

All benchmarking-related plugin customizations (e.g. pytest_addoption, pytest_generate_tests, etc.) currently in filler/filler.py can be moved directly to filler/benchmarking.py. This keeps the benchmarking logic self-contained. Pytest hooks from both modules should compose as expected.

To cleanly handle options/values that are specific to benchmarking, I'd suggestion the following approach, if you agree/like it feel free to go for it!

1. Define a filling mode enum in filler/filler.py:

from enum import StrEnum, unique

@unique
class FillMode(StrEnum):
    CONSENSUS = "consensus"
    BENCHMARKING = "benchmarking"

2. In the filler plugin (filler.py), set the default:

from _pytest.config import Config
from .filler import FillMode

def pytest_configure(config: Config) -> None:
    if not hasattr(config, "fill_mode"):
        config.fill_mode = FillMode.CONSENSUS

3. In the benchmarking plugin (filler/benchmarking.py), override only if --benchmark-gas-values is set:

from _pytest.config import Config
from .filler import FillMode

def pytest_configure(config: Config) -> None:
    if config.getoption("--benchmark-gas-values") is not None:
        config.fill_mode = FillMode.BENCHMARKING

4. Example usage in filler logic, wrapped in a fixture:

import pytest
from ,filler import FillMode

GIGA_GAS = 1_000_000_000

@pytest.fixture
def env() -> Environment:  # noqa: D103
    return 1_000_000_000)
    if config.fill_mode == FillMode.BENCHMARKING:
        return Environment(gas_limit=GIGA_GAS)
    else:
        return Environment()

if "gas_benchmark_value" in metafunc.fixturenames:
gas_benchmark_values = metafunc.config.getoption("gas_benchmark_value")
if gas_benchmark_values:
gas_values = [int(x) for x in gas_benchmark_values.replace(" ", "").split(",")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer strip(); as it's very widely used :)

action="store",
dest="gas_benchmark_value",
type=str,
default=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clear explanation @LouisTsai-Csie, I don't think we should add a default value here and leave as-is.

  • Set a value on the command-line -> benchmarking mode.
  • No value (default) -> regular consensus test mode.

Comment on lines 340 to 345
if config.getoption("gas_benchmark_value"):
benchmark_values = [
int(x) for x in config.getoption("gas_benchmark_value").replace(" ", "").split(",")
]
EnvironmentDefaults.gas_limit = max(benchmark_values) * 1_000_000

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1_000_000_000 sounds good. Gigagas! 😄

But I'd prefer to avoid too much customization on the ./tests/benchmark/ level and would prefer to try and keep it in a central place.

You did give me an idea though... to create a benchmarking plugin (described in the top-level comment of this review).

In that case, the fixture that Mario suggests, goes into the benchmarking plugin and would return gigagas if in benchmarking mode and Environment() (default), if not.

@LouisTsai-Csie I think it's ok to do it in the scope of this PR.

@LouisTsai-Csie LouisTsai-Csie force-pushed the fill-benchmark-command branch from d0413c8 to 946de75 Compare July 15, 2025 09:09


@pytest.fixture
def env(request: pytest.FixtureRequest) -> Environment: # noqa: D103
Copy link
Collaborator Author

@LouisTsai-Csie LouisTsai-Csie Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion with @danceratopz , we want to avoid over-customizing the benchmark test setup (see related comment). Therefore, I added the configuration to tests/conftest.py, which is the parent directory for all test cases, so the rule applies consistently across all test files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to also put this in the filler/benchmarking.py plugin to keep this config in one place.

To ensure that fill works correctly (and that the env pytest fixture is available) even if the benchmarking plugin isn't registered, we could add an analogous pytest fixture there:

@pytest.fixture
def env(request: pytest.FixtureRequest) -> Environment:  # noqa: D103
    return Environment()

When the benchmarking plugin is used with fill, its implementation of env will overwrite fill's implmentation.

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks great to me!

One comment below.



@pytest.fixture
def env(request: pytest.FixtureRequest) -> Environment: # noqa: D103
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to also put this in the filler/benchmarking.py plugin to keep this config in one place.

To ensure that fill works correctly (and that the env pytest fixture is available) even if the benchmarking plugin isn't registered, we could add an analogous pytest fixture there:

@pytest.fixture
def env(request: pytest.FixtureRequest) -> Environment:  # noqa: D103
    return Environment()

When the benchmarking plugin is used with fill, its implementation of env will overwrite fill's implmentation.

@LouisTsai-Csie
Copy link
Collaborator Author

@danceratopz Thank you for review, but I am wondering the following:

  • Should I add test cases under src/cli/tests/? I noticed your recent PR included tests there.
  • Should I add documentation for the new flag? I’m happy to do that, but I ran into some issues building the docs locally with mkdocs (related to cairosvg and missing libcairo on macOS). Let me know if there’s a preferred workaround or if I should just update the markdown and let CI verify the build (not a good idea).

@danceratopz
Copy link
Member

@danceratopz Thank you for review, but I am wondering the following:

* Should I add test cases under `src/cli/tests/`? I noticed your recent [PR](https://github.com/ethereum/execution-spec-tests/pull/1855/files#diff-5c3633f8cbee135e20eb35f9537277edaf7ff69714db9f5c0993431a312ca5f5) included tests there.

I don't think it's strictly necessary for the PR, but some sanity check that the flag works is nice, of course. Recently, I've been pointing Claude at unit testing tasks.

* Should I add documentation for the new flag? I’m happy to do that, but I ran into some [issues](https://github.com/ethereum/execution-spec-tests/issues/1908) building the docs locally with `mkdocs` (related to cairosvg and missing `libcairo` on `macOS`). Let me know if there’s a preferred workaround or if I should just update the markdown and let CI verify the build (not a good idea).

Does this work?

uvx --with=tox-uv tox -e mkdocs

If so, its' because of the macOS trick found in these lines (you can then set the env var locally):

# Required for `cairosvg` so tox can find `libcairo-2`.
# https://squidfunk.github.io/mkdocs-material/plugins/requirements/image-processing/?h=cairo#cairo-library-was-not-found
DYLD_FALLBACK_LIBRARY_PATH = /opt/homebrew/lib

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I did my suggestions locally and execute is working with the new flag! 🎉

@@ -12,6 +12,7 @@ addopts =
-p pytest_plugins.concurrency
-p pytest_plugins.filler.pre_alloc
-p pytest_plugins.filler.filler
-p pytest_plugins.filler.benchmarking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should also go into src/cli/pytest_commands/pytest_ini_files/pytest-execute.ini and src/cli/pytest_commands/pytest_ini_files/pytest-execute-hive.ini.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering previous comment, it would be better if we move this to src/pytest_plugins/shared, since it's used by fill and execute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This occurred to me in the call today 🙂

Comment on lines +347 to +348
if not hasattr(config, "fill_mode"):
config.fill_mode = FillMode.CONSENSUS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be as an else branch in pytest_configure in src/pytest_plugins/filler/benchmarking.py, because otherwise we require to import this filler.py during execute.

@@ -557,6 +569,11 @@ def pytest_html_report_title(report):
report.title = "Fill Test Report"


@pytest.fixture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be moved to src/pytest_plugins/shared/execute_fill.py in order for this all to work with execute too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants