Skip to content

Conversation

kehrazy
Copy link

@kehrazy kehrazy commented Sep 17, 2025

Rationale

  • Previously, reading a record required growing the buffer until a separator was found. With extremely long records (e.g., multi‑GB lines), this could lead to excessive memory usage and stalled progress (sort does not finish for large one line file #8583) before any output was produced. This change keeps memory bounded while ensuring forward progress by spilling oversized records to temporary runs, aligning behavior with external sort implementations.

Overview of changes

  • Reader signaling and buffer capping
    • Introduce ReadProgress with SentChunk | NeedSpill | NoChunk | Finished to make read outcomes explicit.
    • grow_buffer now respects a hard cap.
    • When the buffer hits the cap without a separator, read_to_buffer returns NeedSpill instead of growing further.
  • External sort read/write loop
    • Maintain up to two in‑flight reads to keep the sorter fed.
    • On NeedSpill, stream the current oversized record into a temporary run file (spill_long_record), appending a single separator to match write_lines semantics and pushing post‑separator remainder into carry_over.
    • Preserve the existing ≤2‑chunk in‑memory fast path for small inputs.
    • If reading produces exactly one temporary run and the run is uncompressed and --unique is not used, stream that run directly to the final output (avoids re‑reading giant records during merge).
  • Merge path adjustments
    • Use a bounded buffer in the merge reader, similar to the external sort reader.
    • If a spill signal is encountered during merge (rare), perform a one‑off unbounded read to finish that record and preserve correctness. This keeps the change focused while allowing a future streaming comparator to remove the fallback.

@kehrazy kehrazy marked this pull request as ready for review September 17, 2025 07:52
@kehrazy kehrazy marked this pull request as draft September 17, 2025 07:57
@kehrazy kehrazy force-pushed the sort-fix branch 5 times, most recently from ff0b106 to 7ce1ffe Compare September 17, 2025 08:10
@kehrazy kehrazy marked this pull request as ready for review September 17, 2025 08:10
@kehrazy
Copy link
Author

kehrazy commented Sep 17, 2025

image

@sylvestre
Copy link
Contributor

Many tasks are failing

BTW, could you run the benchmark without your change?
And please past the output directly. Screenshots aren't great :)

@kehrazy
Copy link
Author

kehrazy commented Sep 17, 2025

Many tasks are failing

This, I'm not sure how to fix - the tests are passing, linters are happy - the spellcheck complains about "memrchr", and i10n tests have just timed out. I haven't touched i10n by a large enough margin (I think?) - in what direction should I look at?

BTW, could you run the benchmark without your change? And please past the output directly. Screenshots aren't great :)

Sure thing!

We're getting a sample input using

dd if=/dev/zero bs=1M count=4096 status=progress | tr '\0' 'A' |
head -c 4294967295 > oneline_4G.txt && echo >> oneline_4G.txt

We can't run this exact benchmark using main (as of aaf742d), because the Rust version.. doesn't finish. So, after putting a reasonable timeout (and ignoring exit codes with -i):

hyperfine -i "timeout 15s ./target/release/coreutils sort oneline_4G.txt" "timeout 15s sort oneline_4G.txt" --export-markdown report.md
Command Mean [s] Min [s] Max [s] Relative
timeout 15s ./target/release/coreutils sort oneline_4G.txt 15.003 ± 0.001 15.002 15.005 6.85 ± 0.49
timeout 15s sort oneline_4G.txt 2.191 ± 0.158 1.757 2.317 1.00

..and with changes to readers:

Command Mean [s] Min [s] Max [s] Relative
timeout 15s ./target/release/coreutils sort oneline_4G.txt 2.808 ± 0.168 2.574 3.039 1.23 ± 0.08
timeout 15s sort oneline_4G.txt 2.275 ± 0.034 2.242 2.345 1.00

@kimono-koans
Copy link
Contributor

kimono-koans commented Sep 17, 2025

  • When the buffer hits the cap without a separator, read_to_buffer returns NeedSpill instead of growing further.

This may be the most advantageous way to fix this problem, for now, but wouldn't it make more sense, in the future, to just store the range of file read (if it is a file, not stdin!) without a newline/separator, instead of writing to a new tmp file, with these extremely large buffers? Wouldn't that avoid writes and be more file cache friendly?

Copy link

GNU testsuite comparison:

GNU test failed: tests/chmod/usage. tests/chmod/usage is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/chroot/chroot-credentials. tests/chroot/chroot-credentials is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/csplit/csplit-suppress-matched. tests/csplit/csplit-suppress-matched is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/basic. tests/du/basic is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/exclude. tests/du/exclude is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/hard-link. tests/du/hard-link is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/inacc-dest. tests/du/inacc-dest is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/inodes. tests/du/inodes is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/threshold. tests/du/threshold is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/env/env. tests/env/env is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/ls/abmon-align. tests/ls/abmon-align is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/ls/hyperlink. tests/ls/hyperlink is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/misc/read-errors. tests/misc/read-errors is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/od/od-x8. tests/od/od-x8 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/rm/r-2. tests/rm/r-2 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/rm/rm3. tests/rm/rm3 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/shuf/shuf. tests/shuf/shuf is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/shuf/shuf-reservoir. tests/shuf/shuf-reservoir is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-NaN-infloop. tests/sort/sort-NaN-infloop is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-benchmark-random. tests/sort/sort-benchmark-random is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-compress. tests/sort/sort-compress is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-compress-hang. tests/sort/sort-compress-hang is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-compress-proc. tests/sort/sort-compress-proc is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-discrim. tests/sort/sort-discrim is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-files0-from. tests/sort/sort-files0-from is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-merge. tests/sort/sort-merge is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-rand. tests/sort/sort-rand is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-spinlock-abuse. tests/sort/sort-spinlock-abuse is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-stale-thread-mem. tests/sort/sort-stale-thread-mem is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-u-FMR. tests/sort/sort-u-FMR is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-unique. tests/sort/sort-unique is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-unique-segv. tests/sort/sort-unique-segv is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-version. tests/sort/sort-version is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/test/test-N. tests/test/test-N is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/misc/usage_vs_getopt (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@Qelxiros
Copy link
Contributor

You can fix the cspell errors by adding

// spell-checker:ignore memrchr

to chunks.rs

@kehrazy kehrazy force-pushed the sort-fix branch 2 times, most recently from bd1d34a to b3c9ad9 Compare September 21, 2025 06:28
Copy link

codspeed-hq bot commented Sep 21, 2025

CodSpeed Performance Report

Merging #8652 will degrade performances by 3.43%

Comparing kehrazy:sort-fix (e1318fe) with main (0258583)

Summary

❌ 2 regressions
✅ 83 untouched
⏩ 94 skipped1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
du_balanced_tree[(5, 4, 10)] 9.1 ms 9.3 ms -2.09%
du_human_balanced_tree[(5, 4, 10)] 10.1 ms 10.5 ms -3.43%

Footnotes

  1. 94 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link

GNU testsuite comparison:

GNU test failed: tests/chmod/usage. tests/chmod/usage is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/chroot/chroot-credentials. tests/chroot/chroot-credentials is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/basic. tests/du/basic is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/exclude. tests/du/exclude is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/hard-link. tests/du/hard-link is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/inacc-dest. tests/du/inacc-dest is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/inodes. tests/du/inodes is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/threshold. tests/du/threshold is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/env/env. tests/env/env is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/ls/abmon-align. tests/ls/abmon-align is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/ls/hyperlink. tests/ls/hyperlink is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/misc/read-errors. tests/misc/read-errors is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/od/od-x8. tests/od/od-x8 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/rm/r-2. tests/rm/r-2 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/rm/rm3. tests/rm/rm3 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/shuf/shuf. tests/shuf/shuf is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/shuf/shuf-reservoir. tests/shuf/shuf-reservoir is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-NaN-infloop. tests/sort/sort-NaN-infloop is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-benchmark-random. tests/sort/sort-benchmark-random is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-compress. tests/sort/sort-compress is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-compress-hang. tests/sort/sort-compress-hang is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-compress-proc. tests/sort/sort-compress-proc is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-discrim. tests/sort/sort-discrim is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-files0-from. tests/sort/sort-files0-from is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-merge. tests/sort/sort-merge is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-rand. tests/sort/sort-rand is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-spinlock-abuse. tests/sort/sort-spinlock-abuse is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-stale-thread-mem. tests/sort/sort-stale-thread-mem is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-u-FMR. tests/sort/sort-u-FMR is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-unique. tests/sort/sort-unique is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-unique-segv. tests/sort/sort-unique-segv is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-version. tests/sort/sort-version is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/test/test-N. tests/test/test-N is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/misc/usage_vs_getopt (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/du/2g is now being skipped but was previously passing.

@kehrazy
Copy link
Author

kehrazy commented Sep 21, 2025

What can I do about

Error: No space left on device

in CI? I don't think I made any changes that would break CI in such a way?

@sylvestre
Copy link
Contributor

let me rebase it

@sylvestre
Copy link
Contributor

it is a huge patch, it is possible to make it smaller for review ?

Copy link

GNU testsuite comparison:

GNU test failed: tests/chmod/usage. tests/chmod/usage is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/chroot/chroot-credentials. tests/chroot/chroot-credentials is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/basic. tests/du/basic is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/exclude. tests/du/exclude is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/hard-link. tests/du/hard-link is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/inacc-dest. tests/du/inacc-dest is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/inodes. tests/du/inodes is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/threshold. tests/du/threshold is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/env/env. tests/env/env is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/ls/abmon-align. tests/ls/abmon-align is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/misc/read-errors. tests/misc/read-errors is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/od/od-x8. tests/od/od-x8 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/rm/r-2. tests/rm/r-2 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/rm/rm3. tests/rm/rm3 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/shuf/shuf. tests/shuf/shuf is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/shuf/shuf-reservoir. tests/shuf/shuf-reservoir is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-NaN-infloop. tests/sort/sort-NaN-infloop is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-benchmark-random. tests/sort/sort-benchmark-random is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-compress. tests/sort/sort-compress is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-compress-hang. tests/sort/sort-compress-hang is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-compress-proc. tests/sort/sort-compress-proc is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-discrim. tests/sort/sort-discrim is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-files0-from. tests/sort/sort-files0-from is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-merge. tests/sort/sort-merge is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-rand. tests/sort/sort-rand is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-spinlock-abuse. tests/sort/sort-spinlock-abuse is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-stale-thread-mem. tests/sort/sort-stale-thread-mem is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-u-FMR. tests/sort/sort-u-FMR is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-unique. tests/sort/sort-unique is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-unique-segv. tests/sort/sort-unique-segv is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-version. tests/sort/sort-version is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/test/test-N. tests/test/test-N is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/misc/usage_vs_getopt (fails in this run but passes in the 'main' branch)

@kehrazy
Copy link
Author

kehrazy commented Sep 21, 2025

it is a huge patch, it is possible to make it smaller for review ?

I mean, I can try? but, eh, we touch a lot of parts that were static before - do we really wanna split these out into multiple merges?

the scope of the MR may be reduced, sure (e.g. the reallocation stuff), though!

@sylvestre
Copy link
Contributor

most of the jobs are failing, are you going to work on it? thanks

@kehrazy
Copy link
Author

kehrazy commented Oct 11, 2025

most of the jobs are failing, are you going to work on it? thanks

sure, i will

- Introduce ReadProgress with SentChunk | NeedSpill | NoChunk | Finished
- Cap grow_buffer; read_to_buffer returns NeedSpill on cap without separator
- Maintain up to two in-flight reads
- On NeedSpill, stream oversized record to temp run (spill_long_record)
- Append single separator; push remainder into carry_over
- Preserve ≤2-chunk in-memory fast path
- Use bounded buffer in merge reader
- Fallback unbounded read when spill encountered to preserve correctness
Copy link

GNU testsuite comparison:

GNU test failed: tests/chmod/usage. tests/chmod/usage is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/chroot/chroot-credentials. tests/chroot/chroot-credentials is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/basic. tests/du/basic is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/exclude. tests/du/exclude is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/hard-link. tests/du/hard-link is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/inacc-dest. tests/du/inacc-dest is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/inodes. tests/du/inodes is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/du/threshold. tests/du/threshold is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/env/env. tests/env/env is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/ls/hyperlink. tests/ls/hyperlink is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/misc/read-errors. tests/misc/read-errors is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/od/od-x8. tests/od/od-x8 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/rm/r-2. tests/rm/r-2 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/rm/rm3. tests/rm/rm3 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/shuf/shuf. tests/shuf/shuf is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/shuf/shuf-reservoir. tests/shuf/shuf-reservoir is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-NaN-infloop. tests/sort/sort-NaN-infloop is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-benchmark-random. tests/sort/sort-benchmark-random is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-compress. tests/sort/sort-compress is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-compress-hang. tests/sort/sort-compress-hang is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-compress-proc. tests/sort/sort-compress-proc is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-discrim. tests/sort/sort-discrim is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-files0-from. tests/sort/sort-files0-from is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-merge. tests/sort/sort-merge is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-rand. tests/sort/sort-rand is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-spinlock-abuse. tests/sort/sort-spinlock-abuse is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-stale-thread-mem. tests/sort/sort-stale-thread-mem is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-u-FMR. tests/sort/sort-u-FMR is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-unique. tests/sort/sort-unique is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-unique-segv. tests/sort/sort-unique-segv is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-version. tests/sort/sort-version is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/test/test-N. tests/test/test-N is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/misc/usage_vs_getopt (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/rm/rm1 (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants