feat(fill): add `--gas-benchmark-values` command to support single genesis file #1895

LouisTsai-Csie · 2025-07-11T14:21:02Z

🗒️ Description

This PR introduces a new fill option, --gas-benchmark-values. Supply a comma-separated list of gas amounts (in millions) to set the values used during benchmarking.

The PR also adds two example tests in tests/benchmark/test_worst_blocks.py. To generate their fixtures, run:

uv run fill -v tests/benchmark/test_worst_blocks.py::test_block_full_data \
  --fork Prague \
  --gas-benchmark-values 1,10,30,60,90,120 \
  --generate-pre-alloc-groups \
  --clean

Flag --generate-pre-alloc-groups is required for the enginex fixture format.

The command creates two directories:

fixtures/blockchain_tests_engine_x/benchmark/worst_blocks
fixtures/blockchain_tests_engine_x/pre_alloc

Because only one preAllocGroup is produced, this process generates a single genesis file.

To generate the genesis file, please follow the documentation to run hive locally and run the extract_config command

For example: uv run extract_config --fixture fixtures/blockchain_tests_engine_x/pre_alloc/0x10763c36b27696c5.json

I would prefer to refactor the benchmark test in a separate PR, this task is updated in the issue.

I’ve reviewed the Filling Test section, and I see that the command and flag descriptions are generated by this script. However, I’m happy to contribute additional documentation if needed.

For pytest plugin test cases, I add three cases, you could run with the following command:
Case 1: Verify the --gas-benchmark-values flag is added
Case 2: Verify the flag works as expected if provided
Case 3: Verify the non-benchmark test is not affected.

python -m pytest src/pytest_plugins/filler/tests/test_benchmarking.py -v

🔗 Related Issues or PRs

Issue #1891

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

marioevz

Looks great! I think this a great change for maintainability and make it easier for us to generate more vectors as they are required.

Small downside is that we have to remove Environment().gas_limit from most of tests, but I would say we do it the earlier the better.

cc @jsign for some feedback on my comments.

Thanks!

.github/configs/feature.yaml

marioevz · 2025-07-11T16:20:10Z

src/pytest_plugins/filler/filler.py

+            if "gas_benchmark_value" in metafunc.fixturenames:
+                gas_benchmark_values = metafunc.config.getoption("gas_benchmark_value")
+                if gas_benchmark_values:
+                    gas_values = [int(x) for x in gas_benchmark_values.replace(" ", "").split(",")]


Suggested change

gas_values = [int(x) for x in gas_benchmark_values.replace(" ", "").split(",")]

gas_values = [int(x.strip()) for x in gas_benchmark_values.split(",")]

Not sure if necessary but maybe in case there's a white space between commas?

It is necessary to handle cases like 5, 10, 20,30, string with white spaces between. But the current .replace(" ", "") method already removes the white space

I'd prefer strip(); as it's very widely used :)

marioevz · 2025-07-11T16:30:03Z

src/pytest_plugins/filler/filler.py

+    if config.getoption("gas_benchmark_value"):
+        benchmark_values = [
+            int(x) for x in config.getoption("gas_benchmark_value").replace(" ", "").split(",")
+        ]
+        EnvironmentDefaults.gas_limit = max(benchmark_values) * 1_000_000
+


I would propose that we leave this out and instead create a fixture in the benchmark folder's conftest.py, that generates a default gas value for the setups:

@pytest.fixture def env() -> Environment: # noqa: D103 return Environment(gas_limit=1_000_000_000)

The 1 billion default seems reasonable for most benchmark setups that deploy a ton of contracts for the actual benchmark test.

The tests would now change from Environment().gas_limit to adding the env parameter to the test function and then env.gas_limit to determine the max gas to use in the setup block, while respecting gas_benchmark_value for the attack block.

Nice, sounds good to me.

Looks good, but this looks to be part of the general refactoring, should I do it in this PR, or I could add it in this issue and do it later?

1_000_000_000 sounds good. Gigagas! 😄

But I'd prefer to avoid too much customization on the ./tests/benchmark/ level and would prefer to try and keep it in a central place.

You did give me an idea though... to create a benchmarking plugin (described in the top-level comment of this review).

In that case, the fixture that Mario suggests, goes into the benchmarking plugin and would return gigagas if in benchmarking mode and Environment() (default), if not.

@LouisTsai-Csie I think it's ok to do it in the scope of this PR.

marioevz · 2025-07-11T16:31:36Z

src/pytest_plugins/filler/filler.py

+        action="store",
+        dest="gas_benchmark_value",
+        type=str,
+        default=None,


Suggested change

default=None,

default='5',

This would default to run the benchmark tests with 5 million gas for sanity checking for example, what do you think?

We need to be careful with non-benchmark tests, if a test is not a benchmark test it should be unaffected.
According to the issue description, non-benchmark tests shouldn’t be impacted, since this would change the gas limit.

If we set the default value of --gas-benchmark-values to "5", the non-benchmark ones would automatically apply a benchmark fixture with a 5M gas setting. I am not sure if this is what we want.

Based on my experiment, setting the default to "5" doesn’t apply when the flag is explicitly included without a value, like this:

uv run fill -v tests/benchmark/test_worst_blocks.py::test_block_full_data \ --fork Prague \ --gas-benchmark-values\ --generate-pre-alloc-groups \ --clean # This will lead to error

However, when the flag is ignored the default value is applied, as seen in this command:

uv run fill -v tests/benchmark/test_worst_blocks.py::test_block_full_data \ --fork Prague \ --generate-pre-alloc-groups \ --clean # It will have fork_Prague-blockchain_test_engine_x_from_state_test-benchmark-gas-value_5M-zero_byte_False

Thanks for the clear explanation @LouisTsai-Csie, I don't think we should add a default value here and leave as-is.

Set a value on the command-line -> benchmarking mode.

No value (default) -> regular consensus test mode.

jsign · 2025-07-11T16:39:14Z

Nice! @LouisTsai-Csie @marioevz, is this compatible with supporting all the test formats too? (i.e. #1778).

Mostly asking since I think this is coming from the fact of simplifying the single genesis for perfnets, but wondering if it should still be fine for the other formats that we need for zkVMs.

marioevz · 2025-07-11T18:19:21Z

Nice! @LouisTsai-Csie @marioevz, is this compatible with supporting all the test formats too? (i.e. #1778).

Mostly asking since I think this is coming from the fact of simplifying the single genesis for perfnets, but wondering if it should still be fine for the other formats that we need for zkVMs.

Should be compatible out of the box, but I'll give that a look again and raise if the there's any concerns.

danceratopz

Thanks, this looks great @LouisTsai-Csie!

Shame, that this didn't occur to me up front in #1891, but I'd suggest that we move this codeto a new plugin that gets activated with fill by default. This should work well due to the composability of pytest plugins.

This means, we:

Add these changes (and other benchmarking related pytest config, if any) to a separate pytest plugin, I'd suggest src/pytest_plugins/filling/benchmarking.py.

Enable this plugin using -p via the fill command's pytest ini:

execution-spec-tests/src/cli/pytest_commands/pytest_ini_files/pytest-fill.ini

Lines 11 to 22 in 0f7c73a

    
           addopts =  
        
               -p pytest_plugins.concurrency 
        
               -p pytest_plugins.filler.pre_alloc 
        
               -p pytest_plugins.filler.filler 
        
               -p pytest_plugins.filler.ported_tests 
        
               -p pytest_plugins.filler.static_filler 
        
               -p pytest_plugins.shared.execute_fill 
        
               -p pytest_plugins.forks.forks 
        
               -p pytest_plugins.eels_resolver 
        
               -p pytest_plugins.help.help 
        
               --tb short 
        
               --ignore tests/cancun/eip4844_blobs/point_evaluation_vectors/

All benchmarking-related plugin customizations (e.g. pytest_addoption, pytest_generate_tests, etc.) currently in filler/filler.py can be moved directly to filler/benchmarking.py. This keeps the benchmarking logic self-contained. Pytest hooks from both modules should compose as expected.

To cleanly handle options/values that are specific to benchmarking, I'd suggestion the following approach, if you agree/like it feel free to go for it!

1. Define a filling mode enum in `filler/filler.py`:

from enum import StrEnum, unique

@unique
class FillMode(StrEnum):
    CONSENSUS = "consensus"
    BENCHMARKING = "benchmarking"

2. In the filler plugin (`filler.py`), set the default:

from _pytest.config import Config
from .filler import FillMode

def pytest_configure(config: Config) -> None:
    if not hasattr(config, "fill_mode"):
        config.fill_mode = FillMode.CONSENSUS

3. In the benchmarking plugin (`filler/benchmarking.py`), override only if `--benchmark-gas-values` is set:

from _pytest.config import Config
from .filler import FillMode

def pytest_configure(config: Config) -> None:
    if config.getoption("--benchmark-gas-values") is not None:
        config.fill_mode = FillMode.BENCHMARKING

4. Example usage in filler logic, wrapped in a fixture:

import pytest
from ,filler import FillMode

GIGA_GAS = 1_000_000_000

@pytest.fixture
def env() -> Environment:  # noqa: D103
    return 1_000_000_000)
    if config.fill_mode == FillMode.BENCHMARKING:
        return Environment(gas_limit=GIGA_GAS)
    else:
        return Environment()

danceratopz · 2025-07-14T06:33:15Z

src/pytest_plugins/filler/filler.py

+            if "gas_benchmark_value" in metafunc.fixturenames:
+                gas_benchmark_values = metafunc.config.getoption("gas_benchmark_value")
+                if gas_benchmark_values:
+                    gas_values = [int(x) for x in gas_benchmark_values.replace(" ", "").split(",")]


I'd prefer strip(); as it's very widely used :)

danceratopz · 2025-07-14T06:47:13Z

src/pytest_plugins/filler/filler.py

+        action="store",
+        dest="gas_benchmark_value",
+        type=str,
+        default=None,


Thanks for the clear explanation @LouisTsai-Csie, I don't think we should add a default value here and leave as-is.

Set a value on the command-line -> benchmarking mode.

No value (default) -> regular consensus test mode.

danceratopz · 2025-07-14T06:54:55Z

src/pytest_plugins/filler/filler.py

+    if config.getoption("gas_benchmark_value"):
+        benchmark_values = [
+            int(x) for x in config.getoption("gas_benchmark_value").replace(" ", "").split(",")
+        ]
+        EnvironmentDefaults.gas_limit = max(benchmark_values) * 1_000_000
+


1_000_000_000 sounds good. Gigagas! 😄

But I'd prefer to avoid too much customization on the ./tests/benchmark/ level and would prefer to try and keep it in a central place.

You did give me an idea though... to create a benchmarking plugin (described in the top-level comment of this review).

In that case, the fixture that Mario suggests, goes into the benchmarking plugin and would return gigagas if in benchmarking mode and Environment() (default), if not.

@LouisTsai-Csie I think it's ok to do it in the scope of this PR.

…file

…le entry

src/pytest_plugins/filler/filler.py

tests/conftest.py

LouisTsai-Csie · 2025-07-15T09:16:48Z

tests/conftest.py

+
+
+@pytest.fixture
+def env(request: pytest.FixtureRequest) -> Environment:  # noqa: D103


Based on the discussion with @danceratopz , we want to avoid over-customizing the benchmark test setup (see related comment). Therefore, I added the configuration to tests/conftest.py, which is the parent directory for all test cases, so the rule applies consistently across all test files.

I would suggest to also put this in the filler/benchmarking.py plugin to keep this config in one place.

To ensure that fill works correctly (and that the env pytest fixture is available) even if the benchmarking plugin isn't registered, we could add an analogous pytest fixture there:

@pytest.fixture def env(request: pytest.FixtureRequest) -> Environment: # noqa: D103 return Environment()

When the benchmarking plugin is used with fill, its implementation of env will overwrite fill's implmentation.

danceratopz

Thanks! This looks great to me!

One comment below.

danceratopz · 2025-07-15T09:40:18Z

tests/conftest.py

+
+
+@pytest.fixture
+def env(request: pytest.FixtureRequest) -> Environment:  # noqa: D103


I would suggest to also put this in the filler/benchmarking.py plugin to keep this config in one place.

To ensure that fill works correctly (and that the env pytest fixture is available) even if the benchmarking plugin isn't registered, we could add an analogous pytest fixture there:

@pytest.fixture def env(request: pytest.FixtureRequest) -> Environment: # noqa: D103 return Environment()

When the benchmarking plugin is used with fill, its implementation of env will overwrite fill's implmentation.

…guration

LouisTsai-Csie · 2025-07-15T10:23:39Z

@danceratopz Thank you for review, but I am wondering the following:

Should I add test cases under src/cli/tests/? I noticed your recent PR included tests there.
Should I add documentation for the new flag? I’m happy to do that, but I ran into some issues building the docs locally with mkdocs (related to cairosvg and missing libcairo on macOS). Let me know if there’s a preferred workaround or if I should just update the markdown and let CI verify the build (not a good idea).

danceratopz · 2025-07-15T12:57:17Z

@danceratopz Thank you for review, but I am wondering the following:

* Should I add test cases under `src/cli/tests/`? I noticed your recent [PR](https://github.com/ethereum/execution-spec-tests/pull/1855/files#diff-5c3633f8cbee135e20eb35f9537277edaf7ff69714db9f5c0993431a312ca5f5) included tests there.

I don't think it's strictly necessary for the PR, but some sanity check that the flag works is nice, of course. Recently, I've been pointing Claude at unit testing tasks.

* Should I add documentation for the new flag? I’m happy to do that, but I ran into some [issues](https://github.com/ethereum/execution-spec-tests/issues/1908) building the docs locally with `mkdocs` (related to cairosvg and missing `libcairo` on `macOS`). Let me know if there’s a preferred workaround or if I should just update the markdown and let CI verify the build (not a good idea).

Does this work?

uvx --with=tox-uv tox -e mkdocs

If so, its' because of the macOS trick found in these lines (you can then set the env var locally):

execution-spec-tests/tox.ini

Lines 56 to 58 in dfdd433

    
           # Required for `cairosvg` so tox can find `libcairo-2`. 
        
           # https://squidfunk.github.io/mkdocs-material/plugins/requirements/image-processing/?h=cairo#cairo-library-was-not-found 
        
           DYLD_FALLBACK_LIBRARY_PATH = /opt/homebrew/lib

marioevz

Looks great! I did my suggestions locally and execute is working with the new flag! 🎉

marioevz · 2025-07-15T16:45:11Z

src/cli/pytest_commands/pytest_ini_files/pytest-fill.ini

@@ -12,6 +12,7 @@ addopts =
    -p pytest_plugins.concurrency
    -p pytest_plugins.filler.pre_alloc
    -p pytest_plugins.filler.filler
+    -p pytest_plugins.filler.benchmarking


This line should also go into src/cli/pytest_commands/pytest_ini_files/pytest-execute.ini and src/cli/pytest_commands/pytest_ini_files/pytest-execute-hive.ini.

marioevz · 2025-07-15T16:46:04Z

src/pytest_plugins/filler/benchmarking.py

Considering previous comment, it would be better if we move this to src/pytest_plugins/shared, since it's used by fill and execute.

This occurred to me in the call today 🙂

marioevz · 2025-07-15T16:48:28Z

src/pytest_plugins/filler/filler.py

+    if not hasattr(config, "fill_mode"):
+        config.fill_mode = FillMode.CONSENSUS


This should be as an else branch in pytest_configure in src/pytest_plugins/filler/benchmarking.py, because otherwise we require to import this filler.py during execute.

marioevz · 2025-07-15T16:49:03Z

src/pytest_plugins/filler/filler.py

@@ -557,6 +569,11 @@ def pytest_html_report_title(report):
    report.title = "Fill Test Report"


+@pytest.fixture


Can be moved to src/pytest_plugins/shared/execute_fill.py in order for this all to work with execute too.

LouisTsai-Csie self-assigned this Jul 11, 2025

LouisTsai-Csie added scope:fill Scope: fill command feature:benchmark labels Jul 11, 2025

LouisTsai-Csie force-pushed the fill-benchmark-command branch from 8675c6c to d0413c8 Compare July 11, 2025 15:22

marioevz reviewed Jul 11, 2025

View reviewed changes

marioevz mentioned this pull request Jul 11, 2025

feat(benchmark): create new benchmark_test test type #1896

Open

LouisTsai-Csie requested review from marioevz and danceratopz July 14, 2025 02:13

LouisTsai-Csie marked this pull request as ready for review July 14, 2025 02:13

danceratopz reviewed Jul 14, 2025

View reviewed changes

LouisTsai-Csie added 7 commits July 15, 2025 13:45

feat(fill): add benchmark gas valu command to support single genesis …

4df7f63

…file

refactor(tests): update benchmark test for supported command

e5d1dae

refactor(benchmark): consolidate benchmark configurations into a sing…

c601a8a

…le entry

doc(fill): update command description and changelog

0a2c489

chore(fill): remove legacy gas benchmark values command

7328e43

refactor(fill): create gas benchmakr value pytest plugin

474bab5

test(fill): add pytest plugin test and update state test

946de75

LouisTsai-Csie force-pushed the fill-benchmark-command branch from d0413c8 to 946de75 Compare July 15, 2025 09:09

LouisTsai-Csie commented Jul 15, 2025

View reviewed changes

src/pytest_plugins/filler/filler.py Show resolved Hide resolved

LouisTsai-Csie commented Jul 15, 2025

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

LouisTsai-Csie commented Jul 15, 2025

View reviewed changes

danceratopz reviewed Jul 15, 2025

View reviewed changes

refactor(fill): add env fixture for benchmarking with gas limit confi…

fa71d8d

…guration

marioevz reviewed Jul 15, 2025

View reviewed changes

	gas_values = [int(x) for x in gas_benchmark_values.replace(" ", "").split(",")]
	gas_values = [int(x.strip()) for x in gas_benchmark_values.split(",")]

	addopts =
	-p pytest_plugins.concurrency
	-p pytest_plugins.filler.pre_alloc
	-p pytest_plugins.filler.filler
	-p pytest_plugins.filler.ported_tests
	-p pytest_plugins.filler.static_filler
	-p pytest_plugins.shared.execute_fill
	-p pytest_plugins.forks.forks
	-p pytest_plugins.eels_resolver
	-p pytest_plugins.help.help
	--tb short
	--ignore tests/cancun/eip4844_blobs/point_evaluation_vectors/



		@pytest.fixture
		def env(request: pytest.FixtureRequest) -> Environment: # noqa: D103

		if not hasattr(config, "fill_mode"):
		config.fill_mode = FillMode.CONSENSUS

		@@ -557,6 +569,11 @@ def pytest_html_report_title(report):
		report.title = "Fill Test Report"


		@pytest.fixture

feat(fill): add --gas-benchmark-values command to support single genesis file #1895

Are you sure you want to change the base?

feat(fill): add --gas-benchmark-values command to support single genesis file #1895

Uh oh!

Conversation

LouisTsai-Csie commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsign commented Jul 11, 2025

Uh oh!

marioevz commented Jul 11, 2025

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

1. Define a filling mode enum in filler/filler.py:

2. In the filler plugin (filler.py), set the default:

3. In the benchmarking plugin (filler/benchmarking.py), override only if --benchmark-gas-values is set:

4. Example usage in filler logic, wrapped in a fixture:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie commented Jul 15, 2025

Uh oh!

danceratopz commented Jul 15, 2025

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

feat(fill): add `--gas-benchmark-values` command to support single genesis file #1895

feat(fill): add `--gas-benchmark-values` command to support single genesis file #1895

LouisTsai-Csie commented Jul 11, 2025 •

edited

Loading

LouisTsai-Csie Jul 14, 2025 •

edited

Loading

1. Define a filling mode enum in `filler/filler.py`:

2. In the filler plugin (`filler.py`), set the default:

3. In the benchmarking plugin (`filler/benchmarking.py`), override only if `--benchmark-gas-values` is set:

LouisTsai-Csie Jul 15, 2025 •

edited

Loading