Skip to content

Conversation

maurycy
Copy link
Contributor

@maurycy maurycy commented Oct 14, 2025

The issue #140135 provides more details.

Benchmark

The script:
import io
import pyperf

CHUNK_SIZE = 4096
SIZES = [1, 4, 8, 16, 32, 64, 128]


class ChunkedRaw(io.RawIOBase):
    def __init__(self, data, chunk_size):
        self._buf = memoryview(data)
        self._pos = 0
        self._chunk_size = chunk_size

    def readable(self):
        return True

    def read(self, n: int = -1):
        if self._pos >= len(self._buf):
            return b""

        to_read = (
            self._chunk_size
            if (n is None or n < 0)
            else min(n, self._chunk_size)
        )

        end = min(self._pos + to_read, len(self._buf))
        out = bytes(self._buf[self._pos : end])
        self._pos = end

        return out


def generate_bytes(total):
    block = b"abcdefghijklmnopqrstuvwxyz0123456789" * 128
    return (block * (total // len(block) + 1))[:total]


def _bench_readall(data, chunk_size):
    r = ChunkedRaw(data, chunk_size)
    out = r.readall()
    if len(out) != len(data):
        raise RuntimeError("what is going on...???")


def main():
    runner = pyperf.Runner()
    for size_mib in SIZES:
        total_bytes = size_mib * 1024 * 1024
        data = generate_bytes(total_bytes)
        name = f"rawiobase_readall_{size_mib}MB_chunk{CHUNK_SIZE}"
        runner.bench_func(name, _bench_readall, data, CHUNK_SIZE)


if __name__ == "__main__":
    main()

The results (with --rigorous):

Benchmark main pybytes-iobase-readall
rawiobase_readall_1MB_chunk4096 673 us 111 us: 6.04x faster
rawiobase_readall_4MB_chunk4096 2.49 ms 434 us: 5.74x faster
rawiobase_readall_8MB_chunk4096 5.21 ms 907 us: 5.75x faster
rawiobase_readall_16MB_chunk4096 12.1 ms 2.45 ms: 4.95x faster
rawiobase_readall_32MB_chunk4096 25.1 ms 10.2 ms: 2.45x faster
rawiobase_readall_64MB_chunk4096 51.0 ms 20.4 ms: 2.50x faster
rawiobase_readall_128MB_chunk4096 101 ms 38.9 ms: 2.60x faster
Geometric mean (ref) 3.98x faster

The environment:

% ./python -c "import sysconfig; print(sysconfig.get_config_var('CONFIG_ARGS'))"
'--enable-optimizations' '--with-lto'

sudo ./python -m pyperf system tune ensured.

@maurycy maurycy changed the title gh-140135: use PyBytesWriter in io.RawIOBase.readall, 2.34x faster gh-140135: use PyBytesWriter in io.RawIOBase.readall Oct 14, 2025
@maurycy maurycy changed the title gh-140135: use PyBytesWriter in io.RawIOBase.readall gh-140135: use PyBytesWriter in io.RawIOBase.readall; 3.98x faster Oct 15, 2025
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a NEWS entry, IMO this speedup is significant enough to be documented!

@vstinner
Copy link
Member

Impressive speedup!

Copy link
Contributor

@cmaloney cmaloney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good

@maurycy
Copy link
Contributor Author

maurycy commented Oct 15, 2025

@vstinner

Impressive speedup!

Thank you!

My biggest worry that the benchmark is wrong. I'm more than happy to run a completely different benchmark on the same machine!

Please add a NEWS entry, IMO this speedup is significant enough to be documented!

Done: f0ae824

@maurycy
Copy link
Contributor Author

maurycy commented Oct 15, 2025

The CI failed. I'm not exactly sure why. The same jobs succeeded before:

No meaningful changes since the last run (b8a7f89 is just style, and f0ae824 just adds a NEWS entry.)

From what I see, some flakiness was reported:

I'm rerunning by merging the main branch (not recommended but I couldn't figure out any other way.)

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice optimization!

@emmatyping
Copy link
Member

I think this looks good, the only other thing is I would verify this doesn't regress performance for sizes <1MB (e.g. 1K and 4K maybe).

I expect the performance differences to be much smaller there if at all present, but that's OK, we just want to make sure they aren't regressing.

@maurycy
Copy link
Contributor Author

maurycy commented Oct 15, 2025

@emmatyping

I think this looks good, the only other thing is I would verify this doesn't regress performance for sizes <1MB (e.g. 1K and 4K maybe).

I expect the performance differences to be much smaller there if at all present, but that's OK, we just want to make sure they aren't regressing.

Great point.

The benchmark (the same, just tweaked `SIZES_KB`):
import io
import pyperf

CHUNK_SIZE = 1024
SIZES_KB = [1, 4, 8]


class ChunkedRaw(io.RawIOBase):
    def __init__(self, data, chunk_size):
        self._buf = memoryview(data)
        self._pos = 0
        self._chunk_size = chunk_size

    def readable(self):
        return True

    def read(self, n: int = -1):
        if self._pos >= len(self._buf):
            return b""

        to_read = (
            self._chunk_size
            if (n is None or n < 0)
            else min(n, self._chunk_size)
        )

        end = min(self._pos + to_read, len(self._buf))
        out = bytes(self._buf[self._pos : end])
        self._pos = end

        return out


def generate_bytes(total):
    block = b"abcdefghijklmnopqrstuvwxyz0123456789" * 128
    return (block * (total // len(block) + 1))[:total]


def _bench_readall(data, chunk_size):
    r = ChunkedRaw(data, chunk_size)
    out = r.readall()
    if len(out) != len(data):
        raise RuntimeError("what is going on...???")


def main():
    runner = pyperf.Runner()
    for size_kib in SIZES_KB:
        total_bytes = size_kib * 1024
        data = generate_bytes(total_bytes)
        name = f"rawiobase_readall_{size_kib}KB_chunk{CHUNK_SIZE}"
        runner.bench_func(name, _bench_readall, data, CHUNK_SIZE)


if __name__ == "__main__":
    main()

The result:

Benchmark main pybytes-iobase-readall
rawiobase_readall_1KB_chunk1024 2.51 us 901 ns: 2.79x faster
rawiobase_readall_4KB_chunk1024 5.87 us 2.19 us: 2.68x faster
rawiobase_readall_8KB_chunk1024 9.94 us 3.80 us: 2.62x faster
Geometric mean (ref) 2.69x faster

Copy link
Member

@emmatyping emmatyping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thank you!

@vstinner vstinner enabled auto-merge (squash) October 15, 2025 13:38
@vstinner vstinner merged commit d301587 into python:main Oct 15, 2025
45 checks passed
@maurycy maurycy deleted the pybytes-iobase-readall branch October 15, 2025 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants