Data.ByteString.Lazy.dropEnd: Use two-pointers technique #629

clyring · 2023-11-29T17:28:54Z

This can be seen as using the input itself as an implicit queue; we formerly copied its chunks into an explicit queue.

By writing the key logic as a polymorphic splitAtEndFold, it was easy to re-use it for takeEnd; the latter function should now operate in constant stack space and can release initial chunks of a very long input string sooner.

(I haven't attempted any benchmarking yet.)

This can be seen as using the input itself as an implicit queue; we formerly copied its chunks into an explicit queue. By writing the key logic as a polymorphic `splitAtEndFold`, it was easy to re-use it for `takeEnd`; the latter function should now operate in constant stack space and can release initial chunks of a very long input string sooner.

Data/ByteString/Lazy.hs

...so that a plain 'cabal test' finds the bug almost every try instead of finding it only every few dozen tries

(Some re-compilation check somewhere set a trap for me.) This also replaces fromIntegral with intToIndexTy in a few places.

tests/QuickCheckUtils.hs

Bodigrim · 2024-01-05T19:25:28Z

Data/ByteString/Lazy.hs

+ where
+  -- Idea: Keep two references into the input ByteString:
+  --   "bsL" tracks the current split point,
+  --   "bsR" tracks the yet-unprocessed tail.


Is it true that bsR is a substring of bsL? Maybe calling them tortoise and hare as in Floyd's algorithm would be helpful.

Yes. bsR is always a suffix of bsL, which is always a suffix of the original input. I'm not sure I like the suggested tortoise/hare naming here but agree bsL/bsR isn't great, either.

bsL / bsR are quite misleading IMO: if I were to guess I'd say that they are "left bytestring" and "right bytestring" with regards to a certain split point such that bsL <> bsR equals the input string.

Even something very stupid like bs, subBs and subSubBs would be better.

Bodigrim

Naming aside, looks great to me! If you come up with more descriptive names - cool, if not - feel free to merge as is.

clyring · 2024-02-01T17:47:52Z

I tried renaming to toSplit and toScan--maybe that's better?

I'd like to verify with benchmarks that this performs as expected before I go ahead and merge it.

Bodigrim · 2024-02-01T19:57:03Z

I tried renaming to toSplit and toScan--maybe that's better?

Yes, it's better, thanks. Feel free to merge when you are happy with benchmarks.

clyring · 2024-02-04T14:12:33Z

Baseline (df039bd):

All
  splitAtEnd (lazy)
    takeEnd:                OK
      74.5 μs ± 2.2 μs
    takeEnd (small chunks): OK
      4.98 ms ± 128 μs
    dropEnd:                OK
      280  μs ± 6.6 μs
    dropEnd (small chunks): OK
      29.3 ms ± 373 μs

After (071a062):

All
  splitAtEnd (lazy)
    takeEnd:                OK
      61.1 μs ± 1.3 μs, 18% less than baseline
    takeEnd (small chunks): OK
      3.19 ms ± 123 μs, 36% less than baseline
    dropEnd:                OK
      80.6 μs ± 1.7 μs, 71% less than baseline
    dropEnd (small chunks): OK
      7.48 ms ± 288 μs, 74% less than baseline

* Data.ByteString.Lazy.dropEnd: Use two-pointers technique This can be seen as using the input itself as an implicit queue; we formerly copied its chunks into an explicit queue. By writing the key logic as a polymorphic `splitAtEndFold`, it was easy to re-use it for `takeEnd`; the latter function should now operate in constant stack space and can release initial chunks of a very long input string sooner. * Fix a very silly bug, and strengthen tests ...so that a plain 'cabal test' finds the bug almost every try instead of finding it only every few dozen tries * Actually work around the poison instance (Some re-compilation check somewhere set a trap for me.) This also replaces fromIntegral with intToIndexTy in a few places. * Rewrite the poison instance using TypeError * Rename "bsL" -> "toSplit" and "bsR" -> "toScan" * Add basic benchmarks for lazy takeEnd/splitEnd According to these benchmarks, the new implementation for takeEnd is somewhat faster and the new implementation for dropEnd is roughly 3.5x to 4x as quick as its predecessor. (cherry picked from commit 2bbc97e)

clyring commented Nov 29, 2023

View reviewed changes

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

clyring added 2 commits November 29, 2023 16:53

Fix a very silly bug, and strengthen tests

a48379b

...so that a plain 'cabal test' finds the bug almost every try instead of finding it only every few dozen tries

Actually work around the poison instance

0fab8f3

(Some re-compilation check somewhere set a trap for me.) This also replaces fromIntegral with intToIndexTy in a few places.

clyring commented Dec 12, 2023

View reviewed changes

tests/QuickCheckUtils.hs Outdated Show resolved Hide resolved

Rewrite the poison instance using TypeError

6b943c2

clyring requested a review from Bodigrim January 5, 2024 13:07

Bodigrim reviewed Jan 5, 2024

View reviewed changes

Bodigrim approved these changes Jan 27, 2024

View reviewed changes

clyring added 2 commits February 1, 2024 11:31

Merge branch 'master' into no-deque

39af6cf

Rename "bsL" -> "toSplit" and "bsR" -> "toScan"

0ced807

clyring added 2 commits February 4, 2024 08:57

Add basic benchmarks for lazy takeEnd/splitEnd

df039bd

Merge commit 'df039bd80fa61745' into no-deque

071a062

clyring merged commit 2bbc97e into haskell:master Feb 4, 2024
25 checks passed

clyring added this to the 0.12.1.0 milestone Feb 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data.ByteString.Lazy.dropEnd: Use two-pointers technique #629

Data.ByteString.Lazy.dropEnd: Use two-pointers technique #629

clyring commented Nov 29, 2023

Bodigrim Jan 5, 2024

clyring Jan 27, 2024

Bodigrim Jan 27, 2024

Bodigrim left a comment

clyring commented Feb 1, 2024

Bodigrim commented Feb 1, 2024

clyring commented Feb 4, 2024

Data.ByteString.Lazy.dropEnd: Use two-pointers technique #629

Data.ByteString.Lazy.dropEnd: Use two-pointers technique #629

Conversation

clyring commented Nov 29, 2023

Bodigrim Jan 5, 2024

Choose a reason for hiding this comment

clyring Jan 27, 2024

Choose a reason for hiding this comment

Bodigrim Jan 27, 2024

Choose a reason for hiding this comment

Bodigrim left a comment

Choose a reason for hiding this comment

clyring commented Feb 1, 2024

Bodigrim commented Feb 1, 2024

clyring commented Feb 4, 2024