Skip to content

Conversation

maurycy
Copy link
Contributor

@maurycy maurycy commented Oct 15, 2025

The issue gh-140149 provides more details.

This effectively makes bytes concatenation about 3x faster in the parser, syntax like:

x = (b'meow')
y = (b'meow' b'cow')

Benchmark

The script:
from __future__ import annotations

import ast
import pyperf


def make_src(n, chunk_len, per_line: int = 64):
    assert n > 0 and chunk_len >= 0 and per_line > 0
    chunk = "b'" + ("x" * chunk_len) + "'"
    parts = [chunk] * n
    lines = ["x = ("]
    while parts:
        group = " ".join(parts[:per_line])
        parts = parts[per_line:]
        lines.append(f"    {group}")
    lines.append(")")
    return "\n".join(lines)


def bench_compile(loops, n, chunk_len, per_line: int = 64):
    src = make_src(n, chunk_len, per_line)
    t0 = pyperf.perf_counter()
    for _ in range(loops):
        compile(src, "<bench>", "exec")
    return pyperf.perf_counter() - t0


def bench_ast_parse(loops, n, chunk_len, per_line: int = 64):
    src = make_src(n, chunk_len, per_line)
    t0 = pyperf.perf_counter()
    for _ in range(loops):
        ast.parse(src, filename="<bench>", mode="exec")
    return pyperf.perf_counter() - t0


def main():
    runner = pyperf.Runner()

    for n in (1, 2, 4, 8, 16, 32, 64, 128, 1024):
        runner.bench_time_func(
            f"compile_bytes_concat_n{n}_chunk1",
            bench_compile,
            n,
            1,
        )
        runner.bench_time_func(
            f"parse_bytes_concat_n{n}_chunk1",
            bench_ast_parse,
            n,
            1,
        )

    for n, chunk in ((256, 4), (4, 128), (4, 256)):
        runner.bench_time_func(
            f"compile_bytes_concat_n{n}_chunk{chunk}",
            bench_compile,
            n,
            chunk,
        )
        runner.bench_time_func(
            f"parse_bytes_concat_n{n}_chunk{chunk}",
            bench_ast_parse,
            n,
            chunk,
        )


if __name__ == "__main__":
    main()

The results (with --rigorous, on 9955759):

Benchmark main peg-pybytes-bytes-concat-single-alloc
compile_bytes_concat_n1_chunk1 9.89 us 3.68 us: 2.68x faster
parse_bytes_concat_n1_chunk1 6.97 us 2.55 us: 2.73x faster
compile_bytes_concat_n2_chunk1 10.4 us 3.72 us: 2.79x faster
parse_bytes_concat_n2_chunk1 7.33 us 2.63 us: 2.78x faster
compile_bytes_concat_n4_chunk1 10.8 us 3.90 us: 2.78x faster
parse_bytes_concat_n4_chunk1 7.88 us 2.84 us: 2.77x faster
compile_bytes_concat_n8_chunk1 11.5 us 4.14 us: 2.78x faster
parse_bytes_concat_n8_chunk1 8.51 us 3.04 us: 2.80x faster
compile_bytes_concat_n16_chunk1 13.0 us 4.71 us: 2.76x faster
parse_bytes_concat_n16_chunk1 9.89 us 3.57 us: 2.77x faster
compile_bytes_concat_n32_chunk1 15.4 us 5.59 us: 2.75x faster
parse_bytes_concat_n32_chunk1 12.6 us 4.43 us: 2.85x faster
compile_bytes_concat_n64_chunk1 20.6 us 7.15 us: 2.88x faster
parse_bytes_concat_n64_chunk1 17.6 us 5.99 us: 2.94x faster
compile_bytes_concat_n128_chunk1 30.6 us 10.0 us: 3.05x faster
parse_bytes_concat_n128_chunk1 27.6 us 8.84 us: 3.12x faster
compile_bytes_concat_n1024_chunk1 165 us 48.9 us: 3.38x faster
parse_bytes_concat_n1024_chunk1 162 us 47.8 us: 3.40x faster
compile_bytes_concat_n256_chunk4 60.2 us 18.4 us: 3.27x faster
parse_bytes_concat_n256_chunk4 57.2 us 16.9 us: 3.38x faster
compile_bytes_concat_n4_chunk128 12.7 us 5.11 us: 2.47x faster
parse_bytes_concat_n4_chunk128 9.64 us 3.83 us: 2.51x faster
compile_bytes_concat_n4_chunk256 14.1 us 5.96 us: 2.36x faster
parse_bytes_concat_n4_chunk256 10.9 us 4.57 us: 2.38x faster
Geometric mean (ref) 2.84x faster

The environment:

% ./python -c "import sysconfig; print(sysconfig.get_config_var('CONFIG_ARGS'))"
'--enable-optimizations' '--with-lto'

sudo ./python -m pyperf system tune ensured.

@maurycy
Copy link
Contributor Author

maurycy commented Oct 15, 2025

cc @vstinner @cmaloney

@maurycy maurycy changed the title gh-140149: use PyBytesWriter in action_helpers.c's _build_concatenated_bytes; 3x speed up for bytes concat in the parser gh-140149: use PyBytesWriter in action_helpers.c's _build_concatenated_bytes; 3x faster bytes concat in the parser Oct 15, 2025
PyBytes_Concat(&res, elem->v.Constant.value);
Py_ssize_t part = PyBytes_GET_SIZE(elem->v.Constant.value);
if (part > 0) {
memcpy(out, PyBytes_AS_STRING(elem->v.Constant.value), part);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using PyBytesWriter_WriteBytes here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyBytesWriter_WriteBytes() grows the buffer if needed. It's not needed since the code already computes the total size in advance.

PyBytes_Concat(&res, elem->v.Constant.value);
Py_ssize_t part = PyBytes_GET_SIZE(elem->v.Constant.value);
if (part > 0) {
memcpy(out, PyBytes_AS_STRING(elem->v.Constant.value), part);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyBytesWriter_WriteBytes() grows the buffer if needed. It's not needed since the code already computes the total size in advance.

@vstinner
Copy link
Member

vstinner added the skip news label

This optimization is good to have, but I don't think that users will notice since the parser is only run once at Python startup. I don't think that it's worth it to document this optimization.

@pablogsal
Copy link
Member

vstinner added the skip news label

This optimization is good to have, but I don't think that users will notice since the parser is only run once at Python startup. I don't think that it's worth it to document this optimization.

I concur, but on the other hand it doesn't hurt

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants