-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change IntMap.lookup and add new IntMap.query function #800
Conversation
These are the benchmarks I get on my machine (lookup now doesn't fast fail and query is the old behaviour):
|
And add that it's otherwise slower?
…On Thu, Sep 9, 2021, 9:57 PM konsumlamm ***@***.***> wrote:
***@***.**** requested changes on this pull request.
------------------------------
In containers/src/Data/IntMap/Internal.hs
<#800 (comment)>:
> +-- | /O(min(n,W))/. Query has identical behaviour to 'Data.IntMap.Internal.lookup' but
+-- will fail faster in the case that the key does not share is not present.
⬇️ Suggested change
--- | /O(min(n,W))/. Query has identical behaviour to 'Data.IntMap.Internal.lookup' but
--- will fail faster in the case that the key does not share is not present.
+-- | /O(min(n,W))/. 'query' has identical behaviour to 'lookup', but
+-- will fail faster in the case that the key is not present.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#800 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOOF7N4N7UNI2X7OEDW3ULUBFQY3ANCNFSM5DXX5C6Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@treeowl : done. |
Apologies for the late response!
Frankly, I don't think this is a good idea. We don't know whether any users are relying on
I don't think this is good naming. Imagine you're a Haskell newbie, starting to use I think it would be better to add a more explicit name, such as |
I disagree with you on this one, @sjakobi. While it's true we can't change anything without regressing some things, I think in the vast majority of practical cases, lookups are likely to succeed and we should do what we can to make that the fast path. Change logs, announcements, and documentation should prompt anyone relying on short-circuiting to switch functions. |
Furthermore, from the discussion on the original issue (see #794 (comment) for example), it sounds like it's not as simple as, "if you expect [some vague threshold percentage] of misses, use the fail-fast variant"! |
@mitchellwrosen That is correct (and the above benchmarks show it), the values = [1..2^12]
keys = [1..2^12]
map_mid = IntMap.fromList (zip (map (+ (2^12 `div` 2)) keys) values)
lookup :: [Int] -> M.IntMap Int -> Int
lookup xs m = foldl' (\n k -> fromMaybe n (M.lookup k m)) 0 xs
b = bench "lookup_half" $ whnf (lookup keys) map_mid In particular, this example has lots of key misses that are 'obvious' from the perspective of bit patterns but do lots of traversal otherwise. If the keys are misses for less obvious reasons then the benchmarks wouldn't show a difference and in fact the benchmarks show that the short-circuit does worse for all misses that are close to hits (as such the benchmark is bad but it is a synthetic benchmark and was meant to show something about this change). @sjakobi I understand where you are coming from, why don't I try and see if I can test this change on something that makes large use of intmap (I wanted to try ghc) and see how it goes? I think for smaller consumers of intmap it probably won't make a big difference and for large users we can make the necessary patches to those packages that use the current behaviour in critical ways. I am open to changing the name but I don't think it should be optimized for newbies per se who should not be caring about such issues as to the internals of how intmap works. |
I am on board with @treeowl here: Most changes will benefit some users, and regress some others. My intuition is that this will benefit most users. Definitely those where all lookups succeed, which might actually be a common case (think of the environment in a compiler, all variables will be present in the map). But likely also those users that have failing lookups, but where the looked up keys and the present values are nicely mixed, so the early failure isn’t very early, and the benefit of fewer branches may more. But in the end, only benchmarks can tell. Looking forward to @Boarders’s investigation here (although good luck getting statistically significant results here…) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t the implementatoin find
of find be changed as well? The type signature of find
indicates that the user is expecting success, so here we probably benefit especially well from this optimization?
Yes, |
I'll add it shortly. |
How about |
in issue haskell/containers#794 and PR haskell/containers#800 an alternative implementation of `lookup` is proposed which checks if the key is as expected only when reaching the leaf. This means less branching in the success case or when the lookup key shares long prefixes with existing values, but may do more work when the lookup key is far away from existing keys. Not saying that it’s particularly doubtful that the changed code is correct, but since we have the machinery, why not verify it? So here we go, and yes, it goes through. I don’t think we need to merge the PR like this, and can probably wait for the code to reach a containers release, and then include it there. Also, this PR currently contains a bunch of local hacks that I used to get going again (now that I use NixOS), which need to be removed, merged separately, or made obsolete by better changes to the setup.
Not that anyone would be surprised by this, but the Coq formalization proves that this implementation is semantically correct: plclub/hs-to-coq#188 |
Apologies! I should have fully read the discussion in #794 before commenting here. So I read that failing lookups don't necessarily get slower, and some in fact get faster with this patch. Nevertheless I'm pretty impressed by the massive 6x slowdown of the Is this already the worst contrived case possible? I was imagining that there's some weird corner case where we have to perform 63 successive |
@sjakobi As I said, these worst case scenarios seem very unlikely to representative and involve lots of lookups of keys with bitpatterns without shared prefixes to anything in the map, that doesn't seem like what the library should be optimizing for. |
I think it’d be that one: A map that is maximally deep (already bad – no balanced map will be that deep, so we are already at a “bad” application of IntMap), and then looking up a key that will make
|
@Boarders I fully agree that this library shouldn't optimize for worst-case scenarios. Nevertheless I think we ought to avoid shipping perf regressions to users with unusual use cases for If we accept that some users may encounter serious perf regressions due to this change, we can for example consider a staged rollout of these changes: We could add a |
I just don't think what we currently optimize for is likely to be dominant
in any realistic application. "Usually we look up things that are far away
from anything in the map, and we need to fail quickly" sounds like some
kind of strange joke.
…On Thu, Sep 16, 2021, 1:27 PM Simon Jakobi ***@***.***> wrote:
@Boarders <https://github.com/Boarders> I fully agree that this library
shouldn't optimize for worst-case scenarios. Nevertheless I think we ought
to avoid shipping perf regressions to users with unusual use cases for
IntMap.lookup. That's why I'm curious what the "long tail" of possible
regressions due to this change looks like.
If we accept that some users may encounter serious perf regressions due to
this change, we can for example consider a staged rollout of these changes:
We could add a nomatch-less lookup variant in the next minor release, and
wait with changing the default lookup until the next major release.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#800 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOOF7MCUR7FSVXUNNVQDGDUCISGJANCNFSM5DXX5C6Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Granted, I have no idea what I am doing, but here is what I got from trying to measure this change on GHC building cabal using perf: original:
with this change:
Doesn't look like there is much in it and I don't really know how variable these sort of measurements are, but definitely doesn't look like any sort of serious regression (and in fact looks like it shaves off a bit of time). |
Are there some other serious users of |
I had the same question with #340. I think Clash seemed interesting, but it also depended on GHC which made it tricky to build. I believe @jwaldmann has an |
Not sure what the deal is with the timed-out CI. |
I just restarted CI. We'll see. |
Vanessa McHale (@vmchale) kindly informed me that her compiler, kempe, makes heavy use of IntMaps and so I ran the benchmarks using the main containers branch and this branch and get the following (output courtesy of Ben Gamari's Criterion comparison tool): |
I tried measuring this against some other packages but the synthetic benchmarks were so variable that nothing valuable seemed to come out of the exercise. |
I think we should probably just go ahead. Making performance more
consistently good is just good policy.
…On Sun, Sep 19, 2021, 11:51 AM Callan McGill ***@***.***> wrote:
I tried measuring this against some other packages but the synthetic
benchmarks were so variable that nothing valuable seemed to come out of the
exercise.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#800 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOOF7IAHOHO2WWI6D2O253UCYBGJANCNFSM5DXX5C6Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
What does “go ahead” mean? If the conclusion is that very likely no serious regressions in real applications are expected, then I would be in favor of changing the implementation of |
@nomeata , that sounds like a pretty reasonable approach, yes. |
Ok, this evening I will remove query and then it looks like we should be good to go on getting this merged. |
Yes. I won't delay this any further. But afterwards, we should really look at |
Ah, let I didn't realise about alter, I'll make those changes too (and add some relevant benchmarks for them). |
I looked into changing alter :: (Maybe a -> Maybe a) -> Key -> IntMap a -> IntMap a
alter f !k t@(Bin p m l r)
| nomatch k p m = case f Nothing of
Nothing -> t
Just x -> link k (Tip k x) p t
| zero k m = binCheckLeft p m (alter f k l) r
| otherwise = binCheckRight p m l (alter f k r)
alter f k t@(Tip ky y)
| k==ky = case f (Just y) of
Just x -> Tip ky x
Nothing -> Nil
| otherwise = case f Nothing of
Just x -> link k (Tip k x) ky t
Nothing -> Tip ky y
alter f k Nil = case f Nothing of
Just x -> Tip k x
Nothing -> Nil I changed that to: alter :: (Maybe a -> Maybe a) -> Key -> IntMap a -> IntMap a
alter f !k t@(Bin p m l r)
| zero k m = binCheckLeft p m (alter f k l) r
| otherwise = binCheckRight p m l (alter f k r)
alter f k t@(Tip ky y)
| k==ky = case f (Just y) of
Just x -> Tip ky x
Nothing -> Nil
| otherwise = case f Nothing of
Just x -> link k (Tip k x) ky t
Nothing -> Tip ky y
alter f k Nil = case f Nothing of
Just x -> Tip k x
Nothing -> Nil This definition (strangely to me!) fails the lazy property tests (but not the strict ones) as it fails the λ: m = fromList [(-5, ()), (-2, ())]
λ: f = \case {Nothing -> Just (); Just () -> Nothing}
λ: m' = alter f 0 m
λ: m'
fromList [(-5,()),(-2,()),(0,())] If on the other hand we remove the λ: m = fromList [(-5, ()), (-2, ())]
λ: f = \case {Nothing -> Just (); Just () -> Nothing}
λ: m' = alter f 0 m
λ: m'
fromList [(0,()),(-5,()),(-2,())] I didn't investigate further to figure out why this is a specific problem with |
Ah, I see. I got things rather mixed up. That won't work. What should work is modifying |
Also |
(And |
Ok, thanks, should have it by tomorrow. |
Oh yeah, and delete....
…On Mon, Sep 20, 2021, 12:26 AM Callan McGill ***@***.***> wrote:
Ok, thanks, should have it by tomorrow.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#800 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOOF7JUAMSPNA3EDCNH4UTUC2ZXXANCNFSM5DXX5C6Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
https://gitlab.imn.htwk-leipzig.de/waldmann/containers-benchmark |
I will get back to finishing this work as soon as I am back from holiday! |
- `IntMap.lookup`, `IntMap.find`, `IntMap.adjustWithKey`, `IntMap.updateWithKey` and `IntMap.updateLookupWithKey` no longer check for short circuit failure.
@treeowl Apologies for the delay in making these changes but I have finally gotten around to it. Is there anything else to do here? @jwaldmann When I try to clone your repository I get permission denied for some reason. |
Thanks a lot! |
IntMap.lookup
no longer checks for short circuit failure.IntMap.query
with the old fast-fail behaviour.Addresses #794