hwbench: Adding fio engine

ErwanAliasr1 · ErwanAliasr1 · commit 8d2ffe3a967f · 2024-11-05T14:53:06.000+01:00
This commit is adding a first cmdline engine_module to execute a
single fio command line.

This code has been tested on fio-3.19, defining the minimal release
version.

To enable this mode, engine_module must be set to "cmdline".
The expected command line to forward to fio must be provided in the engine_module_parameter_base.

The command line will be tweaked by hwbench to ensure:
- runtime consistency with other engines : --time_based and --runtime are added
- output consistency: --output-format=json+ is added
- job naming: --name is adjusted to match hwbench's job name
- logs: --write_*_logs are enabled at a 20sec precision
- cache invalidation: each benchmark clears the cache to ensure an
  out-of-cache testing

Please note that :
- Fio's runtime will inherit automatically from hwbench's runtime value.
- --numjobs value will be fed with 'stressor_range' making possible to
  study the scalability of a device with a minimal code.

If one of these values were already present in the
engine_module_parameter_base, hwbench will replace them by the values
that were computed based on the benchmark descrption.

A sample configuration file (configs/fio.conf) is provided as an example, it will:
- test /dev/nvme0n1 in a randread 4k profile
- two benchmarks are automatically created as per the stressor_range
  value ("4,6") :
-- one with numjobs=4
-- one with numjobs=6

The testing suite is added to ensure a proper parsing and benchmarking job creation.

A documentation is also added to detail this first implementation
behavior.

Signed-off-by: Erwan Velu &lt;e.velu@criteo.com&gt;
diff --git a/README.md b/README.md
@@ -26,6 +26,7 @@ The current version of hwbench supports 3 different engines.
 - [stress-ng](https://github.com/ColinIanKing/stress-ng): no need to present this very popular low-level benchmarking tool
 - spike: a custom engine used to make fans spike. Very useful to study the cooling strategy of a server.
 - sleep: a stupid sleep call used to observe how the system is behaving in idle mode
+- [fio](https://github.com/axboe/fio): a flexible storage benchmarking tool, see [documentation](./documentation/fio.md)
 
 Benchmark performance metrics are extracted and saved for later analysis.
 
diff --git a/configs/fio.conf b/configs/fio.conf
@@ -0,0 +1,15 @@
+# This configuration will :
+# - test /dev/nvme0n1 in 4k randread for 40 seconds
+# -- first with 4 stressors
+# -- then with 6 stressors
+[global]
+runtime=40
+monitor=all
+
+[randread_cmdline]
+engine=fio
+engine_module=cmdline
+engine_module_parameter_base=--filename=/dev/nvme0n1 --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=256 --group_reporting --readonly
+hosting_cpu_cores=all
+hosting_cpu_cores_scaling=none
+stressor_range=4,6
diff --git a/documentation/fio.md b/documentation/fio.md
@@ -0,0 +1,53 @@
+# FIO
+
+hwbench can use [fio](https://github.com/axboe/fio) to perform storage benchmarking.
+The current implementation requires fio >= 3.19.
+
+# Concept
+Fio is operated in three(3) different modes by selecting the `engine_module` directive.
+
+
+## Command line
+
+When `engine_module=cmdline` is used, the content of `engine_module_parameter_base` will be passed directly to fio with some limitations.
+
+The following fio keywords are automatically defined, or replaced if present, by hwbench :
+
+- `--runtime`: set to match the exact duration of the current hwbench benchmark.
+- `--time_based`: it's mandatory to have a benchmark lasting `runtime` seconds.
+- `--output-format`: hwbench need the output to be set in `json+` for an easy integration.
+- `--name`: hwbench will use the current job name to ensure its unique over the runs.
+- `--numjobs`: defined by `stressor_range`, can be set as a unique value or a list of values. Each value will generate a new benchmark.
+- `--write_{bw|lat|hist|iops}_logs`: hwbench will automatically collect the performance logs to let hwgraph doing time-based graphs.
+- `--invalidate`: hwbench ensure that every benchmark will be done out of cache.
+
+### Sample configuration file
+
+The following job defines two benchmarks on the same device (nvme0n1).
+
+The `randread_cmdline` job will create :
+- `randread_cmdline_0` benchmark with ``numjobs=4`` extracted from `stressor_range` list
+- `randread_cmdline_1` benchmark with ``numjobs=6`` extracted from `stressor_range` list
+
+```
+[randread_cmdline]
+runtime=600
+engine=fio
+engine_module=cmdline
+engine_module_parameter_base=--filename=/dev/nvme0n1 --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=256 --group_reporting --readonly
+hosting_cpu_cores=all
+hosting_cpu_cores_scaling=none
+stressor_range=4,6
+```
+
+Please note the `hosting_cpu_cores` only selects a set of cores to pin fio. A possible usage would be using a list of cores with a `hosting_cpu_cores_scaling` to study the performance of the same storage device from different NUMA domains.
+
+## External file execution
+Hwbench execute an already existing fio job file.
+
+Not yet implemented.
+
+## Automatic job definition
+Hwbench automatically creates jobs based on some hardware detection and profiles.
+
+Not yet implemented.
diff --git a/hwbench/bench/benchmark.py b/hwbench/bench/benchmark.py
@@ -123,9 +123,9 @@ def pre_run(self):
         cpu_location = ""
         if p.get_pinned_cpu():
             if isinstance(p.get_pinned_cpu(), (int, str)):
-                cpu_location = " on CPU {:3d}".format(p.get_pinned_cpu())
+                cpu_location = " pinned on CPU {:3d}".format(p.get_pinned_cpu())
             elif isinstance(p.get_pinned_cpu(), list):
-                cpu_location = " on CPU [{}]".format(
+                cpu_location = " pinned on CPU [{}]".format(
                     h.cpu_list_to_range(p.get_pinned_cpu())
                 )
             else:
diff --git a/hwbench/bench/test_fio.py b/hwbench/bench/test_fio.py
@@ -0,0 +1,46 @@
+from . import test_benchmarks_common as tbc
+
+
+class TestFio(tbc.TestCommon):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.load_mocked_hardware(
+            cpucores="./hwbench/tests/parsing/cpu_cores/v2321",
+            cpuinfo="./hwbench/tests/parsing/cpu_info/v2321",
+            numa="./hwbench/tests/parsing/numa/8domainsllc",
+        )
+        self.load_benches("./hwbench/config/fio.conf")
+        self.parse_jobs_config()
+        self.QUADRANT0 = list(range(0, 16)) + list(range(64, 80))
+        self.QUADRANT1 = list(range(16, 32)) + list(range(80, 96))
+        self.ALL = list(range(0, 128))
+
+    def test_fio(self):
+        """Check fio syntax."""
+        assert self.benches.count_benchmarks() == 2
+        assert self.benches.count_jobs() == 1
+        assert self.benches.runtime() == 80
+
+        for bench in self.benches.benchs:
+            self.assertIsNone(bench.validate_parameters())
+            bench.get_parameters().get_name() == "randread_cmdline"
+
+        bench_0 = self.get_bench_parameters(0)
+        assert (
+            bench_0.get_engine_module_parameter_base()
+            == "--bs=4k --direct=1 --filename=/dev/nvme0n1 --group_reporting \
+--invalidate=1 --iodepth=256 --ioengine=libaio --log_avg_msec=20000 --name=randread_cmdline_0 \
+--numjobs=4 --output-format=json+ --readonly --runtime=40 --rw=randread --time_based \
+--write_bw_log=fio/randread_cmdline_0_bw.log --write_hist_log=fio/randread_cmdline_0_hist.log \
+--write_iops_log=fio/randread_cmdline_0_iops.log --write_lat_log=fio/randread_cmdline_0_lat.log"
+        )
+
+        bench_1 = self.get_bench_parameters(1)
+        assert (
+            bench_1.get_engine_module_parameter_base()
+            == "--bs=4k --direct=1 --filename=/dev/nvme0n1 --group_reporting \
+--invalidate=1 --iodepth=256 --ioengine=libaio --log_avg_msec=20000 --name=randread_cmdline_1 \
+--numjobs=6 --output-format=json+ --readonly --runtime=40 --rw=randread --time_based \
+--write_bw_log=fio/randread_cmdline_1_bw.log --write_hist_log=fio/randread_cmdline_1_hist.log \
+--write_iops_log=fio/randread_cmdline_1_iops.log --write_lat_log=fio/randread_cmdline_1_lat.log"
+        )
diff --git a/hwbench/config/fio.conf b/hwbench/config/fio.conf
@@ -0,0 +1,18 @@
+# This configuration will :
+# - test /dev/nvme0n1 in 4k randread for 40 seconds
+# -- first with 4 stressors
+# -- then with 6 stressors
+#
+# As runtime is set to 30s by the user, it should be replaced by runtime=40 defined by hardware bench
+[global]
+runtime=40
+monitor=all
+
+[randread_cmdline]
+engine=fio
+engine_module=cmdline
+engine_module_parameter_base=--filename=/dev/nvme0n1 --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=256 --group_reporting --readonly --runtime=30 --numjobs=10 --name=plop
+hosting_cpu_cores=all
+hosting_cpu_cores_scaling=none
+stressor_range=4,6
+
diff --git a/hwbench/config/test_parse_fio.py b/hwbench/config/test_parse_fio.py
@@ -0,0 +1,26 @@
+from unittest.mock import patch
+from ..environment.mock import MockHardware
+from ..bench import test_benchmarks_common as tbc
+
+
+class TestParseConfig(tbc.TestCommon):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.hw = MockHardware()
+        self.load_benches("./hwbench/config/fio.conf")
+
+    def test_sections_name(self):
+        """Check if sections names are properly detected."""
+        sections = self.get_jobs_config().get_sections()
+        assert sections == [
+            "randread_cmdline",
+        ]
+
+    def test_keywords(self):
+        """Check if all keywords are valid."""
+        try:
+            with patch("hwbench.utils.helpers.is_binary_available") as iba:
+                iba.return_value = True
+                self.get_jobs_config().validate_sections()
+        except Exception as exc:
+            assert False, f"'validate_sections' detected a syntax error {exc}"
diff --git a/hwbench/engines/fio.py b/hwbench/engines/fio.py
@@ -0,0 +1,213 @@
+import json
+import pathlib
+from typing import Any
+
+
+from ..bench.parameters import BenchmarkParameters
+from ..bench.engine import EngineBase, EngineModuleBase
+from ..bench.benchmark import ExternalBench
+from ..utils.helpers import fatal
+
+
+class EngineModuleCmdline(EngineModuleBase):
+    """This class implements the EngineModuleBase for fio"""
+
+    def __init__(self, engine: EngineBase, engine_module_name: str, fake_stdout=None):
+        super().__init__(engine, engine_module_name)
+        self.engine_module_name = engine_module_name
+        self.load_module_parameter(fake_stdout)
+
+    def load_module_parameter(self, fake_stdout=None):
+        # if needed add module parameters to your module
+        self.add_module_parameter("cmdline")
+
+    def validate_module_parameters(self, p: BenchmarkParameters):
+        msg = super().validate_module_parameters(p)
+        FioCmdLine(self, p).parse_parameters()
+        return msg
+
+    def run_cmd(self, p: BenchmarkParameters):
+        return FioCmdLine(self, p).run_cmd()
+
+    def run(self, p: BenchmarkParameters):
+        return FioCmdLine(self, p).run()
+
+    def fully_skipped_job(self, p) -> bool:
+        return FioCmdLine(self, p).fully_skipped_job()
+
+
+class Engine(EngineBase):
+    """The main fio class."""
+
+    def __init__(self, fake_stdout=None):
+        super().__init__("fio", "fio")
+        self.add_module(EngineModuleCmdline(self, "cmdline", fake_stdout))
+
+    def run_cmd_version(self) -> list[str]:
+        return [
+            self.get_binary(),
+            "--version",
+        ]
+
+    def run_cmd(self) -> list[str]:
+        return []
+
+    def parse_version(self, stdout: bytes, _stderr: bytes) -> bytes:
+        self.version = stdout.split(b"-")[1].strip()
+        return self.version
+
+    def version_major(self) -> int:
+        if self.version:
+            return int(self.version.split(b".")[0])
+        return 0
+
+    def version_minor(self) -> int:
+        if self.version:
+            return int(self.version.split(b".")[1])
+        return 0
+
+    def parse_cmd(self, stdout: bytes, stderr: bytes):
+        return {}
+
+
+class Fio(ExternalBench):
+    """The Fio stressor."""
+
+    def __init__(
+        self, engine_module: EngineModuleBase, parameters: BenchmarkParameters
+    ):
+        ExternalBench.__init__(self, engine_module, parameters)
+        self.parameters = parameters
+        self.engine_module = engine_module
+        self.log_avg_msec = 20000  # write_*_log are averaged at 20sec
+        self._parse_parameters()
+        # Tests can skip this part
+        if isinstance(parameters.out_dir, pathlib.PosixPath):
+            parameters.out_dir.joinpath("fio").mkdir(parents=True, exist_ok=True)
+
+    def version_compatible(self) -> bool:
+        engine = self.engine_module.get_engine()
+        return engine.version_major() >= 3 and engine.version_minor() >= 19
+
+    def _parse_parameters(self):
+        self.runtime = self.parameters.runtime
+        if self.runtime * 1000 < self.log_avg_msec:
+            fatal(
+                f"Fio runtime cannot be lower than the average log time ({self.log_avg_msec})."
+            )
+
+    def need_skip_because_version(self):
+        if self.skip:
+            # we already skipped this benchmark, we can't know the reason anymore
+            # because we might not have run the version command.
+            return ["echo", "skipped benchmark"]
+        if not self.version_compatible():
+            print(f"WARNING: skipping benchmark {self.name}, needs fio >= 3.19")
+            self.skip = True
+            return ["echo", "skipped benchmark"]
+        return None
+
+    def run_cmd(self) -> list[str]:
+        skip = self.need_skip_because_version()
+        if skip:
+            return skip
+
+        # Let's build the command line to run the tool
+        args = [
+            self.engine_module.get_engine().get_binary(),
+        ]
+
+        return self.get_taskset(args)
+
+    def get_default_fio_command_line(self, args: list) -> list:
+        """Return the default fio arguments"""
+
+        def remove_arg(args, item) -> list:
+            if isinstance(item, str):
+                return [arg for arg in args if not arg.startswith(item)]
+            else:
+                # We need to ensure that value based items are having the right value
+                # This avoid a case where the user already defined a value we need to control
+                for arg in args:
+                    if arg.startswith(item[0]):
+                        if arg != f"{item[0]}={item[1]}":
+                            print(
+                                f"{self.parameters.get_name_with_position()}: Fio parameter {item[0]} is now set to {item[1]}"
+                            )
+                            args.remove(arg)
+
+                return args
+
+        name = self.parameters.get_name_with_position()
+        enforced_items = [
+            ["--runtime", f"{self.parameters.get_runtime()}"],
+            "--time_based",
+            ["--output-format", "json+"],
+            ["--numjobs", self.parameters.get_engine_instances_count()],
+            ["--name", name],
+            ["--invalidate", 1],
+            ["--log_avg_msec", self.log_avg_msec],
+        ]
+        for log_type in ["bw", "lat", "hist", "iops"]:
+            enforced_items.append(f"--write_{log_type}_log=fio/{name}_{log_type}.log")
+
+        for enforced_item in enforced_items:
+            args = remove_arg(args, enforced_item)
+            if isinstance(enforced_item, str):
+                args.append(enforced_item)
+            else:
+                args.append(f"{enforced_item[0]}={enforced_item[1]}")
+
+        return args
+
+    def parse_cmd(self, stdout: bytes, stderr: bytes) -> dict[str, Any]:
+        if self.skip:
+            return self.parameters.get_result_format() | self.empty_result()
+        try:
+            ret = json.loads(stdout)
+        except json.decoder.JSONDecodeError:
+            print(
+                f"{self.parameters.get_name_with_position()}: Cannot load fio's JSON output"
+            )
+            return self.parameters.get_result_format() | self.empty_result()
+
+        return {"fio_results": ret} | self.parameters.get_result_format()
+
+    @property
+    def name(self) -> str:
+        return self.engine_module.get_engine().get_name()
+
+    def run_cmd_version(self) -> list[str]:
+        return self.engine_module.get_engine().run_cmd_version()
+
+    def parse_version(self, stdout: bytes, _stderr: bytes) -> bytes:
+        return self.engine_module.get_engine().parse_version(stdout, _stderr)
+
+    def empy_result(self):
+        """Default empty results for fio"""
+        return {
+            "effective_runtime": 0,
+            "skipped": self.skip,
+            "fio_results": {"jobs": []},
+        }
+
+
+class FioCmdLine(Fio):
+    def parse_parameters(self):
+        """Removing fio arguments set by the engine"""
+        # We need to ensure we have a proper fio command line
+        # Let's remove duplicated and enforce some
+        args = self.parameters.get_engine_module_parameter_base().split()
+
+        # Overriding empb to represent the real executed command.
+        # The list is having unique members and sorted to ensure a constant string representation.
+        self.parameters.engine_module_parameter_base = " ".join(
+            sorted(list(set(self.get_default_fio_command_line(args))))
+        )
+
+    def run_cmd(self) -> list[str]:
+        # Let's build the command line to run the tool
+        return (
+            super().run_cmd()
+            + self.parameters.get_engine_module_parameter_base().split()
+        )
diff --git a/hwbench/engines/test_parse_fio.py b/hwbench/engines/test_parse_fio.py
diff --git a/hwbench/tests/parsing/fio/v319/version b/hwbench/tests/parsing/fio/v319/version
diff --git a/hwbench/tests/parsing/fio/v319/version-stderr b/hwbench/tests/parsing/fio/v319/version-stderr
diff --git a/hwbench/tests/parsing/fio/v319/version-stdout b/hwbench/tests/parsing/fio/v319/version-stdout