Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change IntMap.lookup and add new IntMap.query function #800

Merged
merged 1 commit into from
Nov 20, 2021

Conversation

Boarders
Copy link
Contributor

@Boarders Boarders commented Sep 9, 2021

  • IntMap.lookup no longer checks for short circuit failure.
  • Add a new function IntMap.query with the old fast-fail behaviour.

Addresses #794

@Boarders
Copy link
Contributor Author

Boarders commented Sep 9, 2021

These are the benchmarks I get on my machine (lookup now doesn't fast fail and query is the old behaviour):

benchmarked query_hits
time                 276.0 μs   (266.1 μs .. 284.9 μs)
                     0.996 R²   (0.994 R² .. 0.999 R²)
mean                 261.0 μs   (259.0 μs .. 263.7 μs)
std dev              8.073 μs   (5.881 μs .. 11.27 μs)
variance introduced by outliers: 13% (moderately inflated)

benchmarked query_half
time                 135.6 μs   (132.5 μs .. 139.1 μs)
                     0.997 R²   (0.995 R² .. 0.999 R²)
mean                 135.9 μs   (134.9 μs .. 137.1 μs)
std dev              3.739 μs   (3.134 μs .. 4.913 μs)
variance introduced by outliers: 11% (moderately inflated)

benchmarked query_most
time                 248.7 μs   (243.7 μs .. 254.8 μs)
                     0.993 R²   (0.988 R² .. 0.997 R²)
mean                 245.7 μs   (242.5 μs .. 250.9 μs)
std dev              13.82 μs   (9.804 μs .. 21.46 μs)
variance introduced by outliers: 33% (moderately inflated)

benchmarked query_misses
time                 252.6 μs   (241.0 μs .. 261.1 μs)
                     0.983 R²   (0.971 R² .. 0.992 R²)
mean                 309.3 μs   (298.5 μs .. 319.2 μs)
std dev              37.71 μs   (32.95 μs .. 43.33 μs)
variance introduced by outliers: 71% (severely inflated)

benchmarked query_mixed
time                 35.88 μs   (35.03 μs .. 36.58 μs)
                     0.997 R²   (0.996 R² .. 0.999 R²)
mean                 35.55 μs   (35.31 μs .. 35.78 μs)
std dev              809.9 ns   (685.5 ns .. 1.008 μs)

benchmarked lookup_hits
time                 231.5 μs   (227.3 μs .. 237.0 μs)
                     0.996 R²   (0.993 R² .. 0.998 R²)
mean                 240.3 μs   (238.3 μs .. 243.1 μs)
std dev              8.042 μs   (6.614 μs .. 10.46 μs)
variance introduced by outliers: 15% (moderately inflated)

benchmarked lookup_half
time                 236.0 μs   (227.6 μs .. 251.5 μs)
                     0.985 R²   (0.976 R² .. 0.993 R²)
mean                 232.6 μs   (229.1 μs .. 237.5 μs)
std dev              13.75 μs   (10.51 μs .. 17.64 μs)
variance introduced by outliers: 37% (moderately inflated)

benchmarked lookup_most
time                 220.1 μs   (217.9 μs .. 223.4 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 223.8 μs   (222.0 μs .. 225.7 μs)
std dev              6.178 μs   (5.237 μs .. 7.243 μs)
variance introduced by outliers: 11% (moderately inflated)

benchmarked lookup_misses
time                 231.7 μs   (227.4 μs .. 239.1 μs)
                     0.996 R²   (0.991 R² .. 0.999 R²)
mean                 229.9 μs   (228.0 μs .. 232.9 μs)
std dev              7.857 μs   (6.027 μs .. 10.50 μs)
variance introduced by outliers: 16% (moderately inflated)

benchmarked lookup_mixed
time                 20.19 μs   (19.37 μs .. 20.97 μs)
                     0.994 R²   (0.991 R² .. 0.998 R²)
mean                 20.56 μs   (20.31 μs .. 20.81 μs)
std dev              853.2 ns   (692.1 ns .. 1.152 μs)
variance introduced by outliers: 22% (moderately inflated)

@treeowl
Copy link
Contributor

treeowl commented Sep 10, 2021 via email

@Boarders
Copy link
Contributor Author

@treeowl : done.

containers/src/Data/IntMap/Internal.hs Outdated Show resolved Hide resolved
containers/src/Data/IntMap/Internal.hs Outdated Show resolved Hide resolved
containers/src/Data/IntMap/Internal.hs Outdated Show resolved Hide resolved
containers/src/Data/IntMap/Internal.hs Outdated Show resolved Hide resolved
containers-tests/tests/intmap-properties.hs Outdated Show resolved Hide resolved
@sjakobi
Copy link
Member

sjakobi commented Sep 15, 2021

Apologies for the late response!

  • IntMap.lookup no longer checks for short circuit failure.

Frankly, I don't think this is a good idea. We don't know whether any users are relying on lookup to fail fast. containers is used about everywhere, including packages that don't have active maintainers. I don't think we should release a potentially large performance regression to the ecosystem.

  • Add a new function IntMap.query with the old fast-fail behaviour.

I don't think this is good naming. Imagine you're a Haskell newbie, starting to use Data.IntMap. How do you remember which of lookup and query have the desired fail-fast or fail-slow behaviour?

I think it would be better to add a more explicit name, such as lookupFailSlow (or lookupFailFast).

@treeowl
Copy link
Contributor

treeowl commented Sep 15, 2021

I disagree with you on this one, @sjakobi. While it's true we can't change anything without regressing some things, I think in the vast majority of practical cases, lookups are likely to succeed and we should do what we can to make that the fast path. Change logs, announcements, and documentation should prompt anyone relying on short-circuiting to switch functions.

@mitchellwrosen
Copy link

Furthermore, from the discussion on the original issue (see #794 (comment) for example), it sounds like it's not as simple as, "if you expect [some vague threshold percentage] of misses, use the fail-fast variant"!

@Boarders
Copy link
Contributor Author

Boarders commented Sep 15, 2021

@mitchellwrosen That is correct (and the above benchmarks show it), the nomatch function relies on the prefix stored in the tree to differ before the switching bit of what we are looking up. That means it will be good at short circuiting keys that are far away from those in the map but not those that are close (where closeness is in terms of shared bit prefix). I don't think this is actually a common occurrence and so I don't think the short-circuit default is good. In particular the query_half benchmark is performing a lookup in the following way:

values = [1..2^12]
keys = [1..2^12]
map_mid    = IntMap.fromList (zip (map (+ (2^12 `div` 2)) keys) values)
lookup :: [Int] -> M.IntMap Int -> Int
lookup xs m = foldl' (\n k -> fromMaybe n (M.lookup k m)) 0 xs
b = bench "lookup_half" $ whnf (lookup keys) map_mid

In particular, this example has lots of key misses that are 'obvious' from the perspective of bit patterns but do lots of traversal otherwise. If the keys are misses for less obvious reasons then the benchmarks wouldn't show a difference and in fact the benchmarks show that the short-circuit does worse for all misses that are close to hits (as such the benchmark is bad but it is a synthetic benchmark and was meant to show something about this change).

@sjakobi I understand where you are coming from, why don't I try and see if I can test this change on something that makes large use of intmap (I wanted to try ghc) and see how it goes? I think for smaller consumers of intmap it probably won't make a big difference and for large users we can make the necessary patches to those packages that use the current behaviour in critical ways. I am open to changing the name but I don't think it should be optimized for newbies per se who should not be caring about such issues as to the internals of how intmap works.

@nomeata
Copy link
Contributor

nomeata commented Sep 16, 2021

I am on board with @treeowl here: Most changes will benefit some users, and regress some others. My intuition is that this will benefit most users. Definitely those where all lookups succeed, which might actually be a common case (think of the environment in a compiler, all variables will be present in the map). But likely also those users that have failing lookups, but where the looked up keys and the present values are nicely mixed, so the early failure isn’t very early, and the benefit of fewer branches may more.

But in the end, only benchmarks can tell. Looking forward to @Boarders’s investigation here (although good luck getting statistically significant results here…)

Copy link
Contributor

@nomeata nomeata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t the implementatoin findof find be changed as well? The type signature of find indicates that the user is expecting success, so here we probably benefit especially well from this optimization?

@treeowl
Copy link
Contributor

treeowl commented Sep 16, 2021

Yes, find for sure; that was mentioned early on but maybe lost.

@Boarders
Copy link
Contributor Author

I'll add it shortly.

@Boarders
Copy link
Contributor Author

How about findWithDefault?

nomeata added a commit to nomeata/hs-to-coq that referenced this pull request Sep 16, 2021
in issue haskell/containers#794 and PR
haskell/containers#800 an alternative
implementation of `lookup` is proposed which checks if the key is as
expected only when reaching the leaf. This means less branching in the
success case or when the lookup key shares long prefixes with existing
values, but may do more work when the lookup key is far away from
existing keys.

Not saying that it’s particularly doubtful that the changed code is
correct, but since we have the machinery, why not verify it? So here we
go, and yes, it goes through.

I don’t think we need to merge the PR like this, and can probably wait
for the code to reach a containers release, and then include it there.

Also, this PR currently contains a bunch of local hacks that I used to
get going again (now that I use NixOS), which need to be removed, merged
separately, or made obsolete by better changes to the setup.
@nomeata
Copy link
Contributor

nomeata commented Sep 16, 2021

Not that anyone would be surprised by this, but the Coq formalization proves that this implementation is semantically correct: plclub/hs-to-coq#188

@sjakobi
Copy link
Member

sjakobi commented Sep 16, 2021

Apologies! I should have fully read the discussion in #794 before commenting here.

So I read that failing lookups don't necessarily get slower, and some in fact get faster with this patch.

Nevertheless I'm pretty impressed by the massive 6x slowdown of the lookup_fails benchmark reported in #794 (comment).

Is this already the worst contrived case possible? I was imagining that there's some weird corner case where we have to perform 63 successive zero checks before concluding that the key is not present. What's the slowdown for the absolute worst case?

@Boarders
Copy link
Contributor Author

@sjakobi As I said, these worst case scenarios seem very unlikely to representative and involve lots of lookups of keys with bitpatterns without shared prefixes to anything in the map, that doesn't seem like what the library should be optimizing for.

@nomeata
Copy link
Contributor

nomeata commented Sep 16, 2021

What's the slowdown for the absolute worst case?

I think it’d be that one: A map that is maximally deep (already bad – no balanced map will be that deep, so we are already at a “bad” application of IntMap), and then looking up a key that will make lookup walk to that maximum depth, when it otherwise could fail imediatelly:

Prelude Data.IntMap> let m = unions [ singleton (2^n) () | n <- [0..62]]
Prelude Data.IntMap> Data.IntMap.lookup (2^63) m
Nothing

@sjakobi
Copy link
Member

sjakobi commented Sep 16, 2021

@Boarders I fully agree that this library shouldn't optimize for worst-case scenarios. Nevertheless I think we ought to avoid shipping perf regressions to users with unusual use cases for IntMap.lookup. That's why I'm curious what the "long tail" of possible regressions due to this change looks like.

If we accept that some users may encounter serious perf regressions due to this change, we can for example consider a staged rollout of these changes: We could add a nomatch-less lookup variant in the next minor release, and wait with changing the default lookup until the next major release.

@treeowl
Copy link
Contributor

treeowl commented Sep 16, 2021 via email

@Boarders
Copy link
Contributor Author

Boarders commented Sep 16, 2021

Granted, I have no idea what I am doing, but here is what I got from trying to measure this change on GHC building cabal using perf:

original:

   777,509,663,120      instructions              #    1.26  insn per cycle         
   617,691,499,581      cycles                                                      

     143.578308984 seconds time elapsed

     142.807996000 seconds user
       1.146751000 seconds sys

with this change:

   776,453,461,794      instructions              #    1.32  insn per cycle         
   590,402,339,813      cycles                                                      

     134.333660622 seconds time elapsed

     133.517496000 seconds user
       1.228369000 seconds sys

Doesn't look like there is much in it and I don't really know how variable these sort of measurements are, but definitely doesn't look like any sort of serious regression (and in fact looks like it shaves off a bit of time).

@Boarders
Copy link
Contributor Author

Are there some other serious users of IntMap I could run some benchmarks against does anyone know?

@sjakobi
Copy link
Member

sjakobi commented Sep 16, 2021

Are there some other serious users of IntMap I could run some benchmarks against does anyone know?

I had the same question with #340. I think Clash seemed interesting, but it also depended on GHC which made it tricky to build.

I believe @jwaldmann has an IntMap benchmark that you could try.

@Boarders
Copy link
Contributor Author

Not sure what the deal is with the timed-out CI.

@treeowl
Copy link
Contributor

treeowl commented Sep 17, 2021

I just restarted CI. We'll see.

@Boarders
Copy link
Contributor Author

Boarders commented Sep 17, 2021

Vanessa McHale (@vmchale) kindly informed me that her compiler, kempe, makes heavy use of IntMaps and so I ran the benchmarks using the main containers branch and this branch and get the following (output courtesy of Ben Gamari's Criterion comparison tool):

image

@Boarders
Copy link
Contributor Author

I tried measuring this against some other packages but the synthetic benchmarks were so variable that nothing valuable seemed to come out of the exercise.

@treeowl
Copy link
Contributor

treeowl commented Sep 19, 2021 via email

@nomeata
Copy link
Contributor

nomeata commented Sep 19, 2021

What does “go ahead” mean?

If the conclusion is that very likely no serious regressions in real applications are expected, then I would be in favor of changing the implementation of lookup and find, without extending the API at all. If (and only if) we later learn that there really are users who strongly preferred the old implementations, we can offer both. But I would not complicate the API without hard evidence that someone needs it.

@treeowl
Copy link
Contributor

treeowl commented Sep 19, 2021

@nomeata , that sounds like a pretty reasonable approach, yes.

@Boarders
Copy link
Contributor Author

Ok, this evening I will remove query and then it looks like we should be good to go on getting this merged.

@treeowl
Copy link
Contributor

treeowl commented Sep 19, 2021

Yes. I won't delay this any further. But afterwards, we should really look at alter and alterF. Presumably the same applies to them. We really only want to shortcut for intersection, union, difference, and merge.

@Boarders
Copy link
Contributor Author

Boarders commented Sep 19, 2021

Ah, let I didn't realise about alter, I'll make those changes too (and add some relevant benchmarks for them).

@Boarders
Copy link
Contributor Author

I looked into changing alter and did the rather naive thing. The current definition is as follows:

alter :: (Maybe a -> Maybe a) -> Key -> IntMap a -> IntMap a
alter f !k t@(Bin p m l r)
  | nomatch k p m = case f Nothing of
                      Nothing -> t
                      Just x -> link k (Tip k x) p t
  | zero k m      = binCheckLeft p m (alter f k l) r
  | otherwise     = binCheckRight p m l (alter f k r)
alter f k t@(Tip ky y)
  | k==ky         = case f (Just y) of
                      Just x -> Tip ky x
                      Nothing -> Nil
  | otherwise     = case f Nothing of
                      Just x -> link k (Tip k x) ky t
                      Nothing -> Tip ky y
alter f k Nil     = case f Nothing of
                      Just x -> Tip k x
                      Nothing -> Nil

I changed that to:

alter :: (Maybe a -> Maybe a) -> Key -> IntMap a -> IntMap a
alter f !k t@(Bin p m l r)
  | zero k m      = binCheckLeft p m (alter f k l) r
  | otherwise     = binCheckRight p m l (alter f k r)
alter f k t@(Tip ky y)
  | k==ky         = case f (Just y) of
                      Just x -> Tip ky x
                      Nothing -> Nil
  | otherwise     = case f Nothing of
                      Just x -> link k (Tip k x) ky t
                      Nothing -> Tip ky y
alter f k Nil     = case f Nothing of
                      Just x -> Tip k x
                      Nothing -> Nil

This definition (strangely to me!) fails the lazy property tests (but not the strict ones) as it fails the commonPrefix property. In particular, if we use the original definition of alter on an example quickcheck is unhappy with then we get the following:

λ: m = fromList [(-5, ()), (-2, ())]
λ: f = \case {Nothing -> Just (); Just () -> Nothing}
λ: m' = alter f 0 m
λ: m'
fromList [(-5,()),(-2,()),(0,())]

If on the other hand we remove the nomatch test then we get:

λ: m = fromList [(-5, ()), (-2, ())]
λ: f = \case {Nothing -> Just (); Just () -> Nothing}
λ: m' = alter f 0 m
λ: m'
fromList [(0,()),(-5,()),(-2,())]

I didn't investigate further to figure out why this is a specific problem with Data.IntMap.Lazy. If there is a more obviously correct implementation then please do let me know.

@treeowl
Copy link
Contributor

treeowl commented Sep 20, 2021

Ah, I see. I got things rather mixed up. That won't work. What should work is modifying adjustWithKey to remove the nomatch case.

@treeowl
Copy link
Contributor

treeowl commented Sep 20, 2021

Also updateWithKey, I believe.

@treeowl
Copy link
Contributor

treeowl commented Sep 20, 2021

(And updateLookupWithKey.)

@Boarders
Copy link
Contributor Author

Ok, thanks, should have it by tomorrow.

@treeowl
Copy link
Contributor

treeowl commented Sep 20, 2021 via email

@jwaldmann
Copy link
Contributor

I believe @jwaldmann has an IntMap benchmark that you could try.

https://gitlab.imn.htwk-leipzig.de/waldmann/containers-benchmark

@Boarders
Copy link
Contributor Author

Boarders commented Oct 3, 2021

I will get back to finishing this work as soon as I am back from holiday!

  - `IntMap.lookup`, `IntMap.find`, `IntMap.adjustWithKey`,
    `IntMap.updateWithKey` and `IntMap.updateLookupWithKey`
     no longer check for short circuit failure.
@Boarders
Copy link
Contributor Author

@treeowl Apologies for the delay in making these changes but I have finally gotten around to it. Is there anything else to do here?

@jwaldmann When I try to clone your repository I get permission denied for some reason.

@treeowl
Copy link
Contributor

treeowl commented Nov 20, 2021

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants