Optimize IntSet.Bin #998

meooow25 · 2024-04-06T07:10:59Z

Replace the separate Prefix and Mask Int fields in the Bin constructor with a single Int field which contains both merged together. This reduces the memory required by a Bin from 5 to 4 words, at the cost of more computations (which are cheap bitwise ops) being necessary for certains operations. This follows a similar change done for IntMap.Bin (Optimize IntMap.Bin #995).
Benchmarks show that runtimes for most operations remain unchanged or decrease by a small amount (<10%). As expected, allocations are consistently lower by 11-16% for all set operations that have to make O(log n) allocations.
The functions and types used by both IntSet and IntMap have been moved into a IntTreeCommons module.

Fixes #991.

Memory

Concretely, this reduces the memory required by an IntSet by ~12.5%.

Calculations: For a tree with n Tips, each Tips costs 3 words and there are n-1 Bins each costing 5 words before this change and 4 words after. So we save about 1 out of 8 words.

Compatibility

This PR makes breaking changes to IntSet internals which is reflected in the exports of the internal module Data.IntSet.Internal. Due to the introduction of IntTreeCommons, some exports of Data.IntMap.Internal are moved to IntTreeCommons. There is no change in the exports of any other module.

Benchmarks

Benchmarks done with GHC 9.6.3.
Last updated: 080dde1

Benchmark command: cabal run <target> -- --csv <csv> +RTS -T

intset-benchmarks

Name                       Time - - - - - - - -    Allocated - - - - -     Copied - - - - - - -
                                A       B     %         A       B     %         A       B     %
delete                      66 μs   65 μs   -1%    729 KB  601 KB  -17%    127 B   104 B   -18%
deleteMax                   17 ns   16 ns   -5%    175 B   159 B    -9%      0 B     0 B
deleteMin                   57 ns   51 ns  -11%    764 B   647 B   -15%      0 B     0 B
difference                 721 ns  683 ns   -5%    4.0 KB  3.5 KB  -12%      1 B     2 B   +100%
disjoint:false              33 ns   34 ns   +3%     23 B    23 B    +0%      0 B     0 B
disjoint:true              443 ns  409 ns   -7%      0 B    12 B             0 B     0 B
filter                      21 μs   21 μs   +0%     68 KB   67 KB   -1%     43 B    39 B    -9%
findMax                    6.8 ns  7.1 ns   +4%     39 B    39 B    +0%      0 B     0 B
findMin                    9.1 ns  9.0 ns   +0%     39 B    39 B    +0%      0 B     0 B
fold                       154 ns  156 ns   +0%    2.7 KB  2.7 KB   +0%      1 B     1 B    +0%
fromAscList                 17 μs   17 μs   +0%    4.0 KB  2.8 KB  -29%      2 B     2 B    +0%
fromDistinctAscList         17 μs   17 μs   +0%    4.0 KB  2.8 KB  -29%      3 B     2 B   -33%
fromList                    52 μs   46 μs  -11%    576 KB  479 KB  -16%    392 B   297 B   -24%
fromRange                  347 ns  332 ns   -4%    4.0 KB  3.6 KB  -12%      2 B     1 B   -50%
fromRange:small            9.2 ns   11 ns  +19%    119 B   111 B    -6%      0 B     0 B
insert                      52 μs   45 μs  -12%    576 KB  479 KB  -16%    409 B   308 B   -24%
instanceOrd:dense          454 ms  454 ms   +0%    2.2 GB  2.2 GB   +0%    142 MB  141 MB   +0%
instanceOrd:sparse         826 ms  818 ms   +0%    2.4 GB  2.4 GB   +0%    241 MB  229 MB   -4%
intersection               710 ns  702 ns   -1%    4.0 KB  3.5 KB  -13%      2 B     1 B   -50%
map                         85 μs   81 μs   -5%    1.0 MB  960 KB   -9%    1.1 KB  912 B   -16%
member                      50 μs   48 μs   -2%      0 B     0 B             0 B     0 B
null.intersection:false    706 ns  699 ns   +0%    4.0 KB  3.5 KB  -13%      2 B     1 B   -50%
null.intersection:true     425 ns  420 ns   -1%      0 B    12 B             0 B     0 B
partition                   24 μs   22 μs   -4%     72 KB   71 KB   -1%     79 B    69 B   -12%
spanAntitone:dense         118 ns  112 ns   -4%    581 B   527 B    -9%      0 B     0 B
spanAntitone:sparse        137 ns  120 ns  -12%    772 B   677 B   -12%      0 B     0 B
split:dense                 36 ns   32 ns  -10%    375 B   319 B   -14%      0 B     0 B
split:sparse                49 ns   48 ns   -1%    486 B   399 B   -17%      0 B     0 B
splitMember:dense           38 ns   35 ns   -7%    383 B   326 B   -14%      0 B     0 B
splitMember:sparse          56 ns   44 ns  -20%    494 B   406 B   -17%      0 B     0 B
union                      691 ns  662 ns   -4%    4.0 KB  3.5 KB  -12%      2 B     1 B   -50%
unions                     694 ns  677 ns   -2%    4.1 KB  3.6 KB  -12%      2 B     1 B   -50%

set-operations-intset

Name                           Time - - - - - - - -    Allocated - - - - -     Copied - - - - - - -
                                    A       B     %         A       B     %         A       B     %
difference-block_nn             12 μs   11 μs   -1%     65 KB   54 KB  -16%    530 B   388 B   -26%
difference-block_nn_swap        12 μs   12 μs   +0%     65 KB   54 KB  -16%    552 B   394 B   -28%
difference-block_ns            1.6 μs  1.5 μs   -3%     11 KB  8.8 KB  -17%     17 B    11 B   -35%
difference-block_sn_swap       1.3 μs  1.3 μs   +1%    6.5 KB  5.5 KB  -15%      7 B     5 B   -28%
difference-common_nn            18 μs   17 μs   -3%     98 KB   85 KB  -12%    1.2 KB  925 B   -23%
difference-common_nn_swap       11 μs  9.7 μs  -15%      0 B     0 B             0 B     0 B
difference-common_ns            18 μs   18 μs   -3%     97 KB   85 KB  -12%    1.2 KB  932 B   -23%
difference-common_nt           6.4 μs  6.0 μs   -6%     47 KB   39 KB  -17%    291 B   202 B   -30%
difference-common_sn_swap       11 μs  9.7 μs  -15%      0 B     0 B             0 B     0 B
difference-common_tn_swap      3.2 μs  2.8 μs  -11%      0 B     0 B             0 B     0 B
difference-disj_nn              47 ns   46 ns   -2%    255 B   214 B   -16%      0 B     0 B
difference-disj_nn_swap         51 ns   49 ns   -3%    334 B   278 B   -16%      0 B     0 B
difference-disj_ns              43 ns   41 ns   -4%    255 B   214 B   -16%      0 B     0 B
difference-disj_nt              38 ns   35 ns   -8%    255 B   214 B   -16%      0 B     0 B
difference-disj_sn_swap         44 ns   43 ns   -2%    255 B   214 B   -16%      0 B     0 B
difference-disj_tn_swap         30 ns   34 ns  +14%    135 B   119 B   -11%      0 B     0 B
difference-mix_nn               38 μs   36 μs   -5%    193 KB  169 KB  -12%    4.9 KB  3.7 KB  -23%
difference-mix_nn_swap          38 μs   36 μs   -4%    193 KB  169 KB  -12%    4.8 KB  3.7 KB  -22%
difference-mix_ns               20 μs   19 μs   -4%    107 KB   94 KB  -12%    1.4 KB  1.1 KB  -22%
difference-mix_nt              6.5 μs  6.1 μs   -6%     47 KB   39 KB  -17%    290 B   204 B   -29%
difference-mix_sn_swap          20 μs   19 μs   -5%    107 KB   94 KB  -12%    1.4 KB  1.1 KB  -23%
difference-mix_tn_swap         4.4 μs  4.1 μs   -7%     20 KB   17 KB  -12%     54 B    39 B   -27%
intersection-block_nn          7.2 μs  6.9 μs   -5%      0 B     0 B             0 B     0 B
intersection-block_nn_swap     7.0 μs  6.9 μs   -1%      0 B     0 B             0 B     0 B
intersection-block_ns          891 ns  879 ns   -1%     25 B    25 B    +0%      0 B     0 B
intersection-block_sn_swap     962 ns  864 ns  -10%      0 B    25 B             0 B     0 B
intersection-common_nn          18 μs   17 μs   -2%     97 KB   85 KB  -12%    1.2 KB  904 B   -24%
intersection-common_nn_swap     18 μs   18 μs   +0%     97 KB   85 KB  -11%    1.2 KB  931 B   -21%
intersection-common_ns          18 μs   18 μs   +0%     97 KB   85 KB  -12%    1.1 KB  920 B   -21%
intersection-common_nt         4.6 μs  4.4 μs   -2%     19 KB   17 KB  -11%     50 B    38 B   -24%
intersection-common_sn_swap     18 μs   17 μs   -1%     97 KB   85 KB  -12%    1.2 KB  898 B   -24%
intersection-common_tn_swap    4.3 μs  4.3 μs   +0%     19 KB   17 KB  -11%     50 B    40 B   -19%
intersection-disj_nn            33 ns   30 ns   -9%     31 B    31 B    +0%      0 B     0 B
intersection-disj_nn_swap       32 ns   32 ns   +0%     31 B    31 B    +0%      0 B     0 B
intersection-disj_ns            30 ns   26 ns  -13%     31 B    31 B    +0%      0 B     0 B
intersection-disj_nt            26 ns   23 ns   -8%     31 B    31 B    +0%      0 B     0 B
intersection-disj_sn_swap       29 ns   28 ns   -3%     31 B    31 B    +0%      0 B     0 B
intersection-disj_tn_swap       25 ns   24 ns   -5%     31 B    31 B    +0%      0 B     0 B
intersection-mix_nn             22 μs   21 μs   -2%      0 B     0 B             0 B     0 B
intersection-mix_nn_swap        22 μs   21 μs   -1%      0 B     0 B             0 B     0 B
intersection-mix_ns             12 μs   12 μs   +0%      0 B     0 B             0 B     0 B
intersection-mix_nt            3.2 μs  3.3 μs   +1%      0 B     0 B             0 B     0 B
intersection-mix_sn_swap        12 μs   12 μs   +0%      0 B     0 B             0 B     0 B
intersection-mix_tn_swap       3.1 μs  3.0 μs   -3%      0 B     0 B             0 B     0 B
union-block_nn                  14 μs   13 μs   -4%     93 KB   77 KB  -16%    1.1 KB  776 B   -30%
union-block_nn_swap             14 μs   13 μs   -6%     93 KB   77 KB  -16%    1.1 KB  777 B   -29%
union-block_ns                 1.8 μs  1.7 μs   -7%     13 KB   11 KB  -17%     27 B    18 B   -33%
union-block_sn_swap            1.9 μs  1.7 μs   -9%     13 KB   11 KB  -17%     27 B    18 B   -33%
union-common_nn                 17 μs   17 μs   +1%     97 KB   85 KB  -12%    1.2 KB  965 B   -20%
union-common_nn_swap            17 μs   17 μs   +0%     97 KB   85 KB  -12%    1.2 KB  934 B   -24%
union-common_ns                 17 μs   18 μs   +1%     97 KB   85 KB  -11%    1.2 KB  969 B   -20%
union-common_nt                6.4 μs  5.7 μs   -9%     47 KB   39 KB  -16%    309 B   203 B   -34%
union-common_sn_swap            17 μs   17 μs   +0%     97 KB   85 KB  -12%    1.2 KB  942 B   -22%
union-common_tn_swap           5.9 μs  5.3 μs   -9%     47 KB   39 KB  -16%    301 B   206 B   -31%
union-disj_nn                   62 ns   47 ns  -24%    534 B   438 B   -17%      0 B     0 B
union-disj_nn_swap              61 ns   54 ns  -10%    534 B   438 B   -17%      0 B     0 B
union-disj_ns                   56 ns   41 ns  -26%    454 B   375 B   -17%      0 B     0 B
union-disj_nt                   44 ns   32 ns  -28%    334 B   279 B   -16%      0 B     0 B
union-disj_sn_swap              56 ns   48 ns  -14%    454 B   375 B   -17%      0 B     0 B
union-disj_tn_swap              47 ns   38 ns  -20%    334 B   279 B   -16%      0 B     0 B
union-mix_nn                    36 μs   37 μs   +1%    193 KB  171 KB  -11%    4.7 KB  3.7 KB  -21%
union-mix_nn_swap               36 μs   37 μs   +3%    193 KB  171 KB  -11%    4.7 KB  3.7 KB  -21%
union-mix_ns                    19 μs   20 μs   +2%    107 KB   93 KB  -12%    1.4 KB  1.1 KB  -21%
union-mix_nt                   6.4 μs  5.8 μs   -9%     47 KB   39 KB  -16%    290 B   205 B   -29%
union-mix_sn_swap               19 μs   19 μs   +1%    107 KB   94 KB  -12%    1.4 KB  1.1 KB  -23%
union-mix_tn_swap              5.9 μs  5.5 μs   -6%     47 KB   39 KB  -16%    289 B   204 B   -29%

* Replace the separate Prefix and Mask Int fields in the Bin constructor with a single Int field which contains both merged together. This reduces the memory required by a Bin from 5 to 4 words, at the cost of more computations (which are cheap bitwise ops) being necessary for certains operations. This follows a similar change done for IntMap.Bin. * Benchmarks show that runtimes for most operations remain unchanged or decrease by a small amount (<10%). As expected, allocations are consistently lower by 11-16% for all set operations that have to make O(log n) allocations. * The functions and types used by both IntSet and IntMap have been moved into a IntTreeCommons module.

treeowl · 2024-04-06T11:18:41Z

This isn't related to your changes, but it looks like the IntSet validity test doesn't ensure that each Tip contains at least one element.

treeowl · 2024-04-06T11:24:23Z

containers/src/Data/IntSet/Internal.hs

@@ -334,7 +363,7 @@ null _   = False
 size :: IntSet -> Int
 size = go 0
  where
-    go !acc (Bin _ _ l r) = go (go acc l) r
+    go !acc (Bin _ l r) = go (go acc l) r
    go acc (Tip _ bm) = acc + bitcount 0 bm
    go acc Nil = acc


Why does bitcount take two arguments? Should we just use popCount directly?

Why does bitcount take two arguments?

This seems to be left over from when bitcount was a loop and the first parameter was the accumulator: 4952822#diff-06e022e2fe36764ea9baec24a03a8186d708cc561677a2e6583f08fe180ca073L73-L76

What I don't understand is why it is acc + bitcount 0 bm rather than bitcount acc bm 😄

Should we just use popCount directly?

Probably. Might be better in a different PR though.

I think it would be a totally reasonable "by the way" for this one, if you're so inclined. It strictly improves readability.

Also, we should get rid of the note citing a source for the highest bit algorithm that we no longer use.

Since we were considering removing the Nat stuff in #995, I think it would a better fit with those changes. Same for the obsolete note.

treeowl · 2024-04-08T17:46:17Z

containers-tests/tests/IntSetValidity.hs

+-- * All keys in a Bin start with the Bin's shared prefix.
+-- * All keys in the Bin's left child have the Prefix's mask bit unset.
+-- * All keys in the Bin's right child have the Prefix's mask bit set.
+prefixOk :: IntSet -> Property


I don't see anything checking the prefixes of Bins other than the root. Shouldn't this be called prefixesOk, and perform a recursive test?

For test performance reasons, I suspect we want to consider validity of the prefix of a Bin relative to that of its parent when making the test recursive, treating the (peculiarly simple) case of the root specially.

Fixed 🤦

For test performance reasons, I suspect we want to consider validity of the prefix of a Bin relative to that of its parent when making the test recursive, treating the (peculiarly simple) case of the root specially.

Is the trouble really worth it?

$ cabal run intset-properties ... All 60 tests passed (0.21s)

For the 0.21s, no. But why does it run for such a short time? Can we make it run more tests? Larger tests? Ideally I want to use more efficient tests and also get that testing time up.

Separately, I suspect that a relative test is likely to help give more useful information in case of failure, though I don't know that for sure. Regardless, I won't block up this PR for that.

Can we make it run more tests?

This would be the easiest, we can increase the number of quickcheck runs.

Ideally I want to use more efficient tests and also get that testing time up.

Well, if it helps catch errors. Don't know how one would judge that, except maybe coverage. Otherwise it is a waste.

For Data.Sequence, that's helpful for dealing with "special casing" in some parts of the code. Maybe it's overkill here.

treeowl · 2024-04-08T18:00:43Z

containers-tests/tests/intset-properties.hs

    t ->
      valid t .&&.
-      toAscList t === List.sort (nub ((List.intersect) (xs)  (ys)))
+      toAscList t === (toAscList xs List.\\ toAscList ys)


Data.List.Ordered.minus would be faster than \\.

Do you mean from data-ordlist? Surely this is not worth a new dependency?

Yes, from that package. I imagine we can copy what we like, though I haven't checked its license to be sure. My only real concern about the dependency is that the package doesn't seem to be actively maintained, so it might run into annoying issues with base changes.

Since the tests are quite fast (as mentioned the other comment), this seems like something that shouldn't cause too much worry.
Besides, a new function/dependency is another chance to introduce a bug. Data.List is a little more trustworthy in that regard, I'd say.

treeowl · 2024-04-08T18:03:26Z

containers-tests/tests/intset-properties.hs

+  case intersection xs ys of
+    t ->
+      valid t .&&.
+      toAscList t === (toAscList xs `List.intersect` toAscList ys)


Similarly, Data.List.Ordered.isect should be faster than List.intersect.

meooow25 · 2024-04-16T13:33:37Z

@treeowl, anything I can do here?

treeowl · 2024-04-09T22:53:12Z

containers/src/Data/IntSet/Internal.hs

-                         | otherwise = go l r
+    go def (Bin p l r) | nomatch x p = if x < unPrefix p then unsafeFindMax def else unsafeFindMax r
+                       | left x p  = go def l
+                       | otherwise = go l r


This isn't related to your changes, per se, but we use somewhat different algorithms for lookup in IntMap than in member for IntSet, and the reasoning doesn't seem to be documented in the source. As I recall, IntMap lookup is (as of relatively recently) optimized for the common case that the key is present in the map, whereas IntSet membership is optimized for the (presumably?) common case that the element is not in the set. What's the right reasoning for lookupLT and lookupGT here? I don't have much of a clue.

As I recall, IntMap lookup is (as of relatively recently) optimized for the common case that the key is present in the map...

Yes, appears to be so due to #800

What's the right reasoning for lookupLT and lookupGT here? I don't have much of a clue.

I'm not sure either. Note that the lookup logic (using zero, before #995) could not be directly applied to lookup{L,G}{T,E}. It checked specific bits and was not a binary search. However, the current logic with left can be applied without the nomatch check. This seems worth checking out, but we don't have to do it here I think.

treeowl · 2024-04-16T13:43:00Z

I'm dealing with a family medical emergency this week. I hope to give this a final review shortly.

meooow25 · 2024-05-29T13:32:22Z

@treeowl I hope you are able to find time for this. I would really like to finish this work.

treeowl · 2024-06-04T10:51:07Z

Yes, I'm so sorry about the delay. Problems continue on my end, but that's no reason to hold you up. Merged! Will you try the IntSet now?

meooow25 · 2024-06-04T13:43:09Z

Thank you!

Will you try the IntSet now?

This was the IntSet PR. Are you thinking of something else?

treeowl · 2024-06-04T13:47:15Z

No, I'm just incredibly tired from what's been going on at home.

…

On Tue, Jun 4, 2024, 9:43 AM Soumik Sarkar ***@***.***> wrote: Thank you! Will you try the IntSet now? This was the IntSet PR. Are you thinking of something else? — Reply to this email directly, view it on GitHub <#998 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOOF7IFP3VXBJOJRKYWDLLZFXAAHAVCNFSM6AAAAABF2HCEOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXGU3TGOBTGQ> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

meooow25 · 2024-06-04T13:53:33Z

I hope things get better 🙂

meooow25 added 2 commits April 6, 2024 12:21

Fix LookupGE_IntMap

b3ae761

treeowl reviewed Apr 6, 2024

View reviewed changes

IntSet validity: Tip cannot be empty

40f8503

alexfmpe mentioned this pull request Apr 6, 2024

Full IntSets? #999

Open

meooow25 added 3 commits April 7, 2024 13:12

Generate large keys in Arbitrary IntSet

3d49a54

Fix subsetCmp error

4a1c12b

union, intersection, difference tests using Arbitrary IntSet

fce6246

treeowl reviewed Apr 8, 2024

View reviewed changes

meooow25 added 2 commits April 9, 2024 09:52

Fix prefixOk not checking all Bins

d18a3c6

nudge

bd18faf

treeowl reviewed Apr 16, 2024

View reviewed changes

treeowl merged commit 8562003 into haskell:master Jun 4, 2024
10 checks passed

meooow25 deleted the intset-prefix-mask branch June 4, 2024 13:43

meooow25 mentioned this pull request Aug 8, 2024

Tidy up IntSet/IntMap bitwise stuff #1018

Open

2 tasks

meooow25 mentioned this pull request Sep 26, 2024

Improve the documentation on how IntSet is represented #404

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize IntSet.Bin #998

Optimize IntSet.Bin #998

meooow25 commented Apr 6, 2024 •

edited

Loading

treeowl commented Apr 6, 2024

treeowl Apr 6, 2024

meooow25 Apr 6, 2024

treeowl Apr 6, 2024

treeowl Apr 6, 2024

meooow25 Apr 7, 2024

treeowl Apr 8, 2024

treeowl Apr 8, 2024

meooow25 Apr 9, 2024 •

edited

Loading

treeowl Apr 9, 2024

treeowl Apr 9, 2024

meooow25 Apr 9, 2024

treeowl Apr 9, 2024

treeowl Apr 8, 2024

meooow25 Apr 9, 2024

treeowl Apr 9, 2024

meooow25 Apr 9, 2024

treeowl Apr 8, 2024

meooow25 commented Apr 16, 2024

treeowl Apr 9, 2024

meooow25 Apr 18, 2024

treeowl commented Apr 16, 2024

meooow25 commented May 29, 2024

treeowl commented Jun 4, 2024

meooow25 commented Jun 4, 2024

treeowl commented Jun 4, 2024 via email

meooow25 commented Jun 4, 2024

Optimize IntSet.Bin #998

Optimize IntSet.Bin #998

Conversation

meooow25 commented Apr 6, 2024 • edited Loading

Memory

Compatibility

Benchmarks

treeowl commented Apr 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

meooow25 Apr 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

meooow25 commented Apr 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

treeowl commented Apr 16, 2024

meooow25 commented May 29, 2024

treeowl commented Jun 4, 2024

meooow25 commented Jun 4, 2024

treeowl commented Jun 4, 2024 via email

meooow25 commented Jun 4, 2024

meooow25 commented Apr 6, 2024 •

edited

Loading

meooow25 Apr 9, 2024 •

edited

Loading