gh-140135: use `PyBytesWriter` in `io.RawIOBase.readall`; 3.98x faster #140139

maurycy · 2025-10-14T23:40:40Z

The issue #140135 provides more details.

Benchmark

The script:

import io
import pyperf

CHUNK_SIZE = 4096
SIZES = [1, 4, 8, 16, 32, 64, 128]


class ChunkedRaw(io.RawIOBase):
    def __init__(self, data, chunk_size):
        self._buf = memoryview(data)
        self._pos = 0
        self._chunk_size = chunk_size

    def readable(self):
        return True

    def read(self, n: int = -1):
        if self._pos >= len(self._buf):
            return b""

        to_read = (
            self._chunk_size
            if (n is None or n < 0)
            else min(n, self._chunk_size)
        )

        end = min(self._pos + to_read, len(self._buf))
        out = bytes(self._buf[self._pos : end])
        self._pos = end

        return out


def generate_bytes(total):
    block = b"abcdefghijklmnopqrstuvwxyz0123456789" * 128
    return (block * (total // len(block) + 1))[:total]


def _bench_readall(data, chunk_size):
    r = ChunkedRaw(data, chunk_size)
    out = r.readall()
    if len(out) != len(data):
        raise RuntimeError("what is going on...???")


def main():
    runner = pyperf.Runner()
    for size_mib in SIZES:
        total_bytes = size_mib * 1024 * 1024
        data = generate_bytes(total_bytes)
        name = f"rawiobase_readall_{size_mib}MB_chunk{CHUNK_SIZE}"
        runner.bench_func(name, _bench_readall, data, CHUNK_SIZE)


if __name__ == "__main__":
    main()

The results (with --rigorous):

Benchmark	main	pybytes-iobase-readall
rawiobase_readall_1MB_chunk4096	673 us	111 us: 6.04x faster
rawiobase_readall_4MB_chunk4096	2.49 ms	434 us: 5.74x faster
rawiobase_readall_8MB_chunk4096	5.21 ms	907 us: 5.75x faster
rawiobase_readall_16MB_chunk4096	12.1 ms	2.45 ms: 4.95x faster
rawiobase_readall_32MB_chunk4096	25.1 ms	10.2 ms: 2.45x faster
rawiobase_readall_64MB_chunk4096	51.0 ms	20.4 ms: 2.50x faster
rawiobase_readall_128MB_chunk4096	101 ms	38.9 ms: 2.60x faster
Geometric mean	(ref)	3.98x faster

The environment:

% ./python -c "import sysconfig; print(sysconfig.get_config_var('CONFIG_ARGS'))"
'--enable-optimizations' '--with-lto'

sudo ./python -m pyperf system tune ensured.

Issue: Use PyBytesWriter API in io.RawIOBase.readall #140135

vstinner

Please add a NEWS entry, IMO this speedup is significant enough to be documented!

vstinner · 2025-10-15T00:03:51Z

Impressive speedup!

cmaloney

Overall looking good

Modules/_io/iobase.c

maurycy · 2025-10-15T00:31:46Z

@vstinner

Impressive speedup!

Thank you!

My biggest worry that the benchmark is wrong. I'm more than happy to run a completely different benchmark on the same machine!

Please add a NEWS entry, IMO this speedup is significant enough to be documented!

Done: f0ae824

maurycy · 2025-10-15T00:57:54Z

The CI failed. I'm not exactly sure why. The same jobs succeeded before:

No meaningful changes since the last run (b8a7f89 is just style, and f0ae824 just adds a NEWS entry.)

From what I see, some flakiness was reported:

https://discuss.python.org/t/is-the-test-threading-on-python-pr-easy-to-failed/104359/4

I'm rerunning by merging the main branch (not recommended but I couldn't figure out any other way.)

vstinner

LGTM. Nice optimization!

emmatyping · 2025-10-15T01:10:10Z

I think this looks good, the only other thing is I would verify this doesn't regress performance for sizes <1MB (e.g. 1K and 4K maybe).

I expect the performance differences to be much smaller there if at all present, but that's OK, we just want to make sure they aren't regressing.

maurycy · 2025-10-15T01:17:32Z

@emmatyping

I think this looks good, the only other thing is I would verify this doesn't regress performance for sizes <1MB (e.g. 1K and 4K maybe).

I expect the performance differences to be much smaller there if at all present, but that's OK, we just want to make sure they aren't regressing.

Great point.

The benchmark (the same, just tweaked `SIZES_KB`):

import io
import pyperf

CHUNK_SIZE = 1024
SIZES_KB = [1, 4, 8]


class ChunkedRaw(io.RawIOBase):
    def __init__(self, data, chunk_size):
        self._buf = memoryview(data)
        self._pos = 0
        self._chunk_size = chunk_size

    def readable(self):
        return True

    def read(self, n: int = -1):
        if self._pos >= len(self._buf):
            return b""

        to_read = (
            self._chunk_size
            if (n is None or n < 0)
            else min(n, self._chunk_size)
        )

        end = min(self._pos + to_read, len(self._buf))
        out = bytes(self._buf[self._pos : end])
        self._pos = end

        return out


def generate_bytes(total):
    block = b"abcdefghijklmnopqrstuvwxyz0123456789" * 128
    return (block * (total // len(block) + 1))[:total]


def _bench_readall(data, chunk_size):
    r = ChunkedRaw(data, chunk_size)
    out = r.readall()
    if len(out) != len(data):
        raise RuntimeError("what is going on...???")


def main():
    runner = pyperf.Runner()
    for size_kib in SIZES_KB:
        total_bytes = size_kib * 1024
        data = generate_bytes(total_bytes)
        name = f"rawiobase_readall_{size_kib}KB_chunk{CHUNK_SIZE}"
        runner.bench_func(name, _bench_readall, data, CHUNK_SIZE)


if __name__ == "__main__":
    main()

The result:

Benchmark	main	pybytes-iobase-readall
rawiobase_readall_1KB_chunk1024	2.51 us	901 ns: 2.79x faster
rawiobase_readall_4KB_chunk1024	5.87 us	2.19 us: 2.68x faster
rawiobase_readall_8KB_chunk1024	9.94 us	3.80 us: 2.62x faster
Geometric mean	(ref)	2.69x faster

emmatyping

Great thank you!

use PyBytesWriter in _io__RawIOBase_readall_impl

f669add

bedevere-app bot added the awaiting review label Oct 14, 2025

bedevere-app bot mentioned this pull request Oct 14, 2025

Use PyBytesWriter API in io.RawIOBase.readall #140135

Open

maurycy changed the title ~~gh-140135: use PyBytesWriter in io.RawIOBase.readall, 2.34x faster~~ gh-140135: use PyBytesWriter in io.RawIOBase.readall Oct 14, 2025

unused var

eab9742

maurycy changed the title ~~gh-140135: use PyBytesWriter in io.RawIOBase.readall~~ gh-140135: use PyBytesWriter in io.RawIOBase.readall; 3.98x faster Oct 15, 2025

vstinner reviewed Oct 15, 2025

View reviewed changes

cmaloney reviewed Oct 15, 2025

View reviewed changes

Modules/_io/iobase.c Show resolved Hide resolved

maurycy added 2 commits October 15, 2025 02:11

be curly

b8a7f89

NEWS

f0ae824

Merge branch 'main' into pybytes-iobase-readall

8630390

vstinner approved these changes Oct 15, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Oct 15, 2025

emmatyping approved these changes Oct 15, 2025

View reviewed changes

cmaloney approved these changes Oct 15, 2025

View reviewed changes

maurycy mentioned this pull request Oct 15, 2025

Use PyBytesWriter API in PEG parser's _build_concatenated_bytes, avoid quadratic memory allocations #140149

Open

Remove empty line

1b9d655

vstinner enabled auto-merge (squash) October 15, 2025 13:38

vstinner merged commit d301587 into python:main Oct 15, 2025
45 checks passed

bedevere-app bot removed the awaiting merge label Oct 15, 2025

maurycy deleted the pybytes-iobase-readall branch October 15, 2025 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-140135: use `PyBytesWriter` in `io.RawIOBase.readall`; 3.98x faster #140139

gh-140135: use `PyBytesWriter` in `io.RawIOBase.readall`; 3.98x faster #140139

maurycy commented Oct 14, 2025 •

edited

Loading

Uh oh!

vstinner left a comment

Uh oh!

vstinner commented Oct 15, 2025

Uh oh!

cmaloney left a comment

Uh oh!

Uh oh!

maurycy commented Oct 15, 2025 •

edited

Loading

Uh oh!

maurycy commented Oct 15, 2025 •

edited

Loading

Uh oh!

vstinner left a comment

Uh oh!

emmatyping commented Oct 15, 2025

Uh oh!

maurycy commented Oct 15, 2025 •

edited

Loading

Uh oh!

emmatyping left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

gh-140135: use PyBytesWriter in io.RawIOBase.readall; 3.98x faster #140139

gh-140135: use PyBytesWriter in io.RawIOBase.readall; 3.98x faster #140139

Conversation

maurycy commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

vstinner commented Oct 15, 2025

Uh oh!

cmaloney left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maurycy commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maurycy commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

emmatyping commented Oct 15, 2025

Uh oh!

maurycy commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emmatyping left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gh-140135: use `PyBytesWriter` in `io.RawIOBase.readall`; 3.98x faster #140139

gh-140135: use `PyBytesWriter` in `io.RawIOBase.readall`; 3.98x faster #140139

maurycy commented Oct 14, 2025 •

edited

Loading

maurycy commented Oct 15, 2025 •

edited

Loading

maurycy commented Oct 15, 2025 •

edited

Loading

maurycy commented Oct 15, 2025 •

edited

Loading