Streaming base64 encode/decode #8622

ThePseudo · 2025-09-12T08:14:06Z

On the main branch, the encode and decode operations look at the file ahead-of-time to gather information about padding. However, padding only appears at the end, and the rest of the file can be encoded and decoded disregarding the padding.

The main issue with the file being read ahead-of-time is that we need the entire file to be available from the beginning. This is in contrast with a use case that can be streaming data: imagine you have a web socket, the sender sends base64-encoded data, but the receiver can only translate it in the end, making real-time communication impossible.

Moreover, reading the entire file from the beginning means that it needs to stay in RAM the whole time. For smaller files it is not a problem, but when encoding to base64 few gigabytes of file this can be an issue, as it could easily saturate the main memory when reading the file.

This patch is aimed to solve the issue of the ahead-of-time reading. First, we do not check for padding, but let the decoder work for us: as said earlier, most of the encoded file does not have padding, and there is a 1/3 probability that there is no padding in the end. The STANDARD_NO_PAD base64 decoder used produces an error if padding is present; if so, we resort to the STANDARD base64 decoder. This is how the problem of the padding ahead-of-time is solved.

Also, please notice that the encoder does not need any ahead-of-time knowledge of padding, since it is the encoder itself that generates it.

For the benchmarking:
coreutils base64 refers to this PR version
coreutils_main_branch base64 refers to the version that is on the main branch
base64 refers to GNU Coreutils base64

As this is partially also a performance-related patch, I will paste the hyperfine analysis:

For encoding:

Benchmark 1: ./coreutils base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      2.423 s ±  0.039 s    [User: 0.997 s, System: 1.424 s]
  Range (min … max):    2.393 s …  2.524 s    10 runs
 
Benchmark 2: ./coreutils_main_branch base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      4.111 s ±  0.035 s    [User: 1.172 s, System: 2.937 s]
  Range (min … max):    4.052 s …  4.158 s    10 runs
 
Benchmark 3: base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      4.000 s ±  0.016 s    [User: 3.054 s, System: 0.941 s]
  Range (min … max):    3.976 s …  4.033 s    10 runs
 
Summary
  ./coreutils base64 model-00001-of-000163.safetensors ran
    1.65 ± 0.03 times faster than base64 model-00001-of-000163.safetensors
    1.70 ± 0.03 times faster than ./coreutils_main_branch base64 model-00001-of-000163.safetensors

For decoding:

Benchmark 1: ./coreutils base64 -d base64.txt
  Time (mean ± σ):      9.442 s ±  0.060 s    [User: 7.622 s, System: 1.814 s]
  Range (min … max):    9.373 s …  9.580 s    10 runs
 
Benchmark 2: ./coreutils_main_branch base64 -d base64.txt
  Time (mean ± σ):      9.504 s ±  0.201 s    [User: 5.766 s, System: 3.727 s]
  Range (min … max):    9.309 s …  9.882 s    10 runs
 
Benchmark 3: base64 -d base64.txt
  Time (mean ± σ):      8.362 s ±  0.140 s    [User: 6.750 s, System: 1.605 s]
  Range (min … max):    8.155 s …  8.527 s    10 runs
 
Summary
  base64 -d base64.txt ran
    1.13 ± 0.02 times faster than ./coreutils base64 -d base64.txt
    1.14 ± 0.03 times faster than ./coreutils_main_branch base64 -d base64.txt

For memory consumption, using ps and grep on the 3 implementation variants working on the same file to gather the memory used, I will put the 3 values near each other to compare. I will report the entire line, since it has no sensitive information for me.

This approach is feasible because the memory footprint remains stable during the program execution: after the file is loaded/memory is allocated, there is no more large allocations that take place (except, maybe, inside of the fast_encoder/decoder in the base64_simd crate, which is shown by the flamegraph tool (I used flamegraph, which also generates an svg to explore) (image at the end of this PR).

For encoding:

andrea    167746  100  0.0  15880  6616 pts/6    R+   10:08   0:01 ./coreutils base64 model-00001-of-000163.safetensors
andrea    168813  102  6.1 5127348 1894336 pts/6 R+   10:10   0:00 ./coreutils_main_branch base64 model-00001-of-000163.safetensors
andrea    169415  100  0.0   8392  2272 pts/6    R+   10:11   0:02 base64 model-00001-of-000163.safetensors

For decoding:

andrea    164864  100  0.0  15876  6288 pts/6    R+   10:01   0:01 ./coreutils base64 -d base64.txt
andrea    165735  125  0.7 6920844 233384 pts/6  R+   10:03   0:00 ./coreutils_main_branch base64 -d base64.txt
andrea    166374  100  0.0   8388  2208 pts/6    R+   10:05   0:03 base64 -d base64.txt

the issue we still have is that memory usage is double with respect to the GNU Coreutils implementation, but it also does not increase with the size of the file.

Malloc inside base64_simd:

github-actions · 2025-09-12T08:35:19Z

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

sylvestre · 2025-09-12T12:33:30Z

Could you please share your example file? I don't get the same results

ThePseudo · 2025-09-12T13:23:36Z

Uhm it is almost 5 GB large... maybe I can try with a smaller one? What do you suggest?

Nevermind, I found it back online... it is one of the models for DeepSeek, those are available here. https://huggingface.co/deepseek-ai/DeepSeek-V3/tree/main

Probably a good option is selecting this one: https://huggingface.co/deepseek-ai/DeepSeek-V3/resolve/main/model-00001-of-000163.safetensors?download=true

It is roughly the same size

github-actions · 2025-09-15T07:39:39Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

ThePseudo · 2025-09-15T10:01:56Z

I re-ran the tests with the file linked above:

For encoding:

Benchmark 1: ./coreutils base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      2.152 s ±  0.066 s    [User: 0.952 s, System: 1.199 s]
  Range (min … max):    2.092 s …  2.301 s    10 runs
 
Benchmark 2: ./coreutils_main_branch base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      3.759 s ±  0.119 s    [User: 1.140 s, System: 2.619 s]
  Range (min … max):    3.616 s …  3.976 s    10 runs
 
Benchmark 3: base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      3.723 s ±  0.032 s    [User: 3.044 s, System: 0.679 s]
  Range (min … max):    3.687 s …  3.783 s    10 runs
 
Summary
  ./coreutils base64 model-00001-of-000163.safetensors ran
    1.73 ± 0.05 times faster than base64 model-00001-of-000163.safetensors
    1.75 ± 0.08 times faster than ./coreutils_main_branch base64 model-00001-of-000163.safetensors

For decoding:

Benchmark 1: ./coreutils base64 -d base64.txt
  Time (mean ± σ):      9.167 s ±  0.101 s    [User: 7.637 s, System: 1.499 s]
  Range (min … max):    9.063 s …  9.347 s    10 runs
 
Benchmark 2: ./coreutils_main_branch base64 -d base64.txt
  Time (mean ± σ):      9.329 s ±  0.020 s    [User: 5.620 s, System: 3.669 s]
  Range (min … max):    9.301 s …  9.380 s    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: base64 -d base64.txt
  Time (mean ± σ):      8.038 s ±  0.037 s    [User: 6.471 s, System: 1.536 s]
  Range (min … max):    7.991 s …  8.104 s    10 runs
 
Summary
  base64 -d base64.txt ran
    1.14 ± 0.01 times faster than ./coreutils base64 -d base64.txt
    1.16 ± 0.01 times faster than ./coreutils_main_branch base64 -d base64.txt

The system is also on some load, so it might be slower than usual, but more or less the results stay consistent with what reported before. Please let me know if there is any difference.

github-actions · 2025-09-16T06:28:53Z

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

github-actions · 2025-09-17T07:33:20Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

github-actions · 2025-09-17T09:44:26Z

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

github-actions · 2025-09-18T07:48:15Z

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

github-actions · 2025-09-19T14:06:05Z

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

codspeed-hq · 2025-09-20T18:29:49Z

CodSpeed Performance Report

Merging #8622 will not alter performance

_{Comparing ThePseudo:streamline_b64_decode (76cb7e6) with main (0258583)}

Summary

✅ 106 untouched
⏩ 73 skipped¹

73 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

github-actions · 2025-09-20T18:44:50Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

src/uu/base32/src/base_common.rs

github-actions · 2025-09-22T06:57:19Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

github-actions · 2025-09-22T08:23:47Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

github-actions · 2025-09-22T12:55:00Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

github-actions · 2025-09-23T06:28:42Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

github-actions · 2025-09-24T07:44:22Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/tail/overlay-headers is no longer failing!

github-actions · 2025-09-24T14:00:47Z

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

ThePseudo · 2025-09-26T07:32:43Z

@Nekrolm do you think it is ok to be merged?

Nekrolm · 2025-09-26T09:48:51Z

I'm ok with these changes. But I'm not a maintainer

ThePseudo · 2025-09-26T11:55:22Z

@sylvestre then, what do you think?

aduskett · 2025-09-29T07:15:08Z

Any news on getting this merged? It looks great!

sylvestre · 2025-09-29T07:26:40Z

i would like to see benchmark integrated in the repo before it is merged

is someone interested in doing that ? (in a different PR)
we have examples here:
ls -d src/uu/*/benches

ThePseudo · 2025-09-29T11:56:55Z

I could do it in a different PR!

github-actions · 2025-09-30T10:18:09Z

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

ThePseudo · 2025-10-01T13:13:32Z

@sylvestre I guess now it should work, hope it looks good! :D

github-actions · 2025-10-13T07:17:38Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

github-actions · 2025-10-13T12:00:06Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

github-actions · 2025-10-16T06:41:36Z

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

This should remove the dependency we have in knowing whether the final message has padding or not. This is the first step to not have a ahead-of-time loading of the entire message to encode/decode, and allow for streaming. Signed-off-by: Andrea Calabrese <[email protected]>

As per title, this is the main feature of this patch set. First, by avoiding looking for the final padding, there is the ability to read data streaming in before the stream finished producing them. This also enables the tool to work with much less memory needed, essentially making it a fixed amount instead of tepending by the file size. Signed-off-by: Andrea Calabrese <[email protected]>

We read linearly, so we do not need to seek within a file Signed-off-by: Andrea Calabrese <[email protected]>

github-actions · 2025-10-17T10:05:40Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

ThePseudo marked this pull request as ready for review September 12, 2025 08:17

ThePseudo mentioned this pull request Sep 12, 2025

Base64 does not support streaming data #8625

Open

ThePseudo force-pushed the streamline_b64_decode branch from 8f5d7b1 to c38288b Compare September 15, 2025 07:18

ThePseudo force-pushed the streamline_b64_decode branch from c38288b to 8e0969e Compare September 16, 2025 06:07

ThePseudo force-pushed the streamline_b64_decode branch from 8e0969e to 44147d1 Compare September 17, 2025 07:11

ThePseudo force-pushed the streamline_b64_decode branch from 44147d1 to 05d7d9f Compare September 17, 2025 09:24

ThePseudo force-pushed the streamline_b64_decode branch from 05d7d9f to 1854b91 Compare September 18, 2025 07:27

Nekrolm reviewed Sep 21, 2025

View reviewed changes

src/uu/base32/src/base_common.rs Show resolved Hide resolved

ThePseudo force-pushed the streamline_b64_decode branch 2 times, most recently from bcd2ec4 to 1854b91 Compare September 22, 2025 08:01

ThePseudo force-pushed the streamline_b64_decode branch 2 times, most recently from 23bc39f to 1bc46e6 Compare September 22, 2025 12:35

ThePseudo force-pushed the streamline_b64_decode branch from e9882d5 to f60b1b9 Compare September 24, 2025 13:36

sylvestre force-pushed the streamline_b64_decode branch from f60b1b9 to 4e08bb6 Compare September 30, 2025 09:57

ThePseudo force-pushed the streamline_b64_decode branch from 4e08bb6 to c91947a Compare October 1, 2025 06:15

ThePseudo force-pushed the streamline_b64_decode branch from c91947a to 88e3b2b Compare October 13, 2025 06:57

ThePseudo force-pushed the streamline_b64_decode branch from 88e3b2b to 9bdac31 Compare October 13, 2025 11:38

ThePseudo force-pushed the streamline_b64_decode branch from 9bdac31 to 87978e0 Compare October 16, 2025 06:16

Andrea Calabrese added 3 commits October 17, 2025 11:44

Remove Seek from required traits

76cb7e6

We read linearly, so we do not need to seek within a file Signed-off-by: Andrea Calabrese <[email protected]>

ThePseudo force-pushed the streamline_b64_decode branch from 87978e0 to 76cb7e6 Compare October 17, 2025 09:45

Uh oh!

Streaming base64 encode/decode #8622

Are you sure you want to change the base?

Streaming base64 encode/decode #8622

Conversation

ThePseudo commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

sylvestre commented Sep 12, 2025

Uh oh!

ThePseudo commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

ThePseudo commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

github-actions bot commented Sep 18, 2025

Uh oh!

github-actions bot commented Sep 19, 2025

Uh oh!

codspeed-hq bot commented Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #8622 will not alter performance

Summary

Footnotes

Uh oh!

github-actions bot commented Sep 20, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 23, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

ThePseudo commented Sep 26, 2025

Uh oh!

Nekrolm commented Sep 26, 2025

Uh oh!

ThePseudo commented Sep 26, 2025

Uh oh!

aduskett commented Sep 29, 2025

Uh oh!

sylvestre commented Sep 29, 2025

Uh oh!

ThePseudo commented Sep 29, 2025

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

ThePseudo commented Oct 1, 2025

Uh oh!

github-actions bot commented Oct 13, 2025

Uh oh!

github-actions bot commented Oct 13, 2025

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

github-actions bot commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

ThePseudo commented Sep 12, 2025 •

edited

Loading

ThePseudo commented Sep 12, 2025 •

edited

Loading

codspeed-hq bot commented Sep 20, 2025 •

edited

Loading