-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make NonEmpty functions less gratuitously lazy #107
Comments
This has been bothering me for a long time. Thanks for making this proposal, @treeowl. I would suggest the following principle:
Mostly this is so I don't have to keep two different sets of subtly different strictness behaviors in my mind, but also it may eventually make sense to make the In a quick scan, I disagree with your initial assessments of the following functions:
|
@treeowl, can you explain why (It's not to me - I don't see why I should have to worry about whether there are more elements after the first just to get the first element. But then my personal natural model for a non-empty list type is a refinement type (or, in first approximation, an invariant-protecting newtype) of the list type.) |
@nomeata, I probably said that too strongly. One thing, for me, is that consing and unconsing seem like really basic operations for something I'm calling a list, and for |
Oh, absolutely! I share that sentiment, and was one reason I was briefly considering experimenting with a |
* Add basic benchmarks for inits/tails * Add NonEmpty variants of inits and tails The lazy versions use new implementations: - Lazy tails got about 10% faster with ghc-9.2. (A happy accident!) - Lazy inits got much faster: - For the first few chunks it is about 1.5x faster, due to better list fusion. - When there are many chunks it is about 4x faster. * Formatting and comments, as suggested in review * Add link to a relevant CLC issue about NonEmpty - haskell/core-libraries-committee#107
* Add basic benchmarks for inits/tails * Add NonEmpty variants of inits and tails The lazy versions use new implementations: - Lazy tails got about 10% faster with ghc-9.2. (A happy accident!) - Lazy inits got much faster: - For the first few chunks it is about 1.5x faster, due to better list fusion. - When there are many chunks it is about 4x faster. * Formatting and comments, as suggested in review * Add link to a relevant CLC issue about NonEmpty - haskell/core-libraries-committee#107
* Add basic benchmarks for inits/tails * Add NonEmpty variants of inits and tails The lazy versions use new implementations: - Lazy tails got about 10% faster with ghc-9.2. (A happy accident!) - Lazy inits got much faster: - For the first few chunks it is about 1.5x faster, due to better list fusion. - When there are many chunks it is about 4x faster. * Formatting and comments, as suggested in review * Add link to a relevant CLC issue about NonEmpty - haskell/core-libraries-committee#107 (cherry picked from commit d4933c6)
I think I second @nomeata 's request for clarification - there's a lot of "I want this" and "I propose this change" but not a lot of justification or explanation why. Most of these changes seem pretty reasonable, but I'm also a bit hesitant to make breaking changes in the laziness/semantics of a datatype. Following in the pattern of This approach wouldn't break anyone's code, and would allow for folks to explicitly select lazy vs strict variants of functions. The two modules could have documentation suggesting when you would want one or the other. In the event that the community decides that
Can you explain why That feels pretty unnatural to me. You have to duplicate the code for handling the "known" case (definitely have at least one case uncons xs of
Left a -> foo a
Right (a, as) -> foo a <> foos as
case uncons xs of
(a, mas) -> foo a <> foldMap foos mas
let (a, mas) = uncons xs
(a, mas) <- do ...; pure (uncons xs)
Verifying my own understanding here: λ> import Data.List.NonEmpty
λ> toList undefined
[*** Exception: Prelude.undefined
CallStack (from HasCallStack):
undefined, called at <interactive>:2:8 in interactive:Ghci2
λ> toList' (a :| as) = a : as
λ> toList' undefined
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
undefined, called at <interactive>:4:9 in interactive:Ghci3 The new variant will
Oof, this one is gnarly -
Removing the irrefutable pattern would mean that |
@parsonsmatt I like the idea, but do I understand correctly that in order to maintain both variants, we kind of have to fix the type signature of |
I think we should be really careful about introducing For But in other cases it makes a lot less sense. I think In the case of
I could imagine that at some point in the future we might want to add something like the value strict variant to |
The basic problem is that |
Yes, nicely said. I fully agree with this principle stated. I followed it when designing https://github.com/Bodigrim/infinite-list#laziness. Just to reiterate the problem. A function returning a record type with a single constructor can always return this very constructor before even looking at its arguments, not even weak head normal form. Notice the irrefutable pattern matching with data Pair a = Pair a a
myFmap :: (a -> b) -> Pair a -> Pair b
myFmap f ~(Pair x y) = Pair (f x) (f y) Under the hood this definition translates to myFmap f p = Pair (let Pair x _ = p in f x) (let Pair _ y = p in f y) On the first glance it might look like a good idea: there is nothing else other than > myFmap undefined undefined `seq` ()
() That's not what we usually want, and that's why instance Functor Pair where
fmap f (Pair x y) = Pair (f x) (f y) and > fmap undefined (undefined :: Pair ()) `seq` ()
*** Exception: Prelude.undefined This principle holds for every data type in instance Functor NonEmpty where
fmap f ~(a :| as) = f a :| fmap f as or equivalently instance Functor NonEmpty where
fmap f aas = (let a :| _ = aas in f a) :| (let _ :| as = aas in fmap f as) There are multiple issues:
Other functions in |
That's convincing to me. +1 |
This issue spurred me into making llun, which uses pattern synonyms to address @treeowl's point that |
Dear CLC members. Could you please provide (non-binding) opinions on the proposal? Do you agree with the principle suggested in #107 (comment)? Would you like to apply it to @tomjaguarpaw @chshersh @angerman @hasufell @mixphix @parsonsmatt |
The correspondence principle between |
How do we assess impact? I also like the idea, but I'm not sure I see enough motivation if this can break code. |
I agree with the proposal and with further suggestions in #107 (comment). My view is simple: if you label arguments with Side comment: In addition to
I think, |
Not in its current structure, unfortunately. Besides, running all tests of all packages will likely fail too often. Stackage curators maintain a list of test suites which should be excluded. Maybe one can run |
@treeowl how would you like to proceed with this? There seems to be enough support of the idea, but a convincing impact assessment is likely to be very hard. CC @juhp @DanBurton @cdornan @alexeyzab @mihaimaruseac as Stackage Curators (and sorry if I missed anyone else). Is there an easy way to build a Stackage snapshot and run all enabled test suites against a custom GHC? We would like to take the latest GHC 9.4 release, modify laziness of routines in |
@Bodigrim it should be possible - we use a dedicated buildserver (thanks to @fpco) to run curator with quite a bit of diskspace - but I am not sure how reproducible our build environment is, though most of it is in a container. Also cc @bergmark. Also curator just uses stack underneath, so for the custom ghc, it should be possible to setup, but I am not sure if curator makes it harder. I don't think we can use the Stackage server to run your tests (we had a similar request a while back, which we couldn't fulfil), but we can try to help with questions/issues that arise. (This is my personal take, other curators can also chime in if they have something to add.) |
As Jens says, it will require resources -- a server that can build all the packages together.
Interestingly, David has a proposal to run the stackage setup with GHC-nightly -- if you can afford to take some time then that initiative could generate just what you need.
Chris
… On 14 Apr 2023, at 02:47, Jens Petersen ***@***.***> wrote:
@Bodigrim <https://github.com/Bodigrim> it should be possible - we use a dedicated buildserver (thanks to @fpco <https://github.com/fpco>) to run curator with quite a bit of diskspace - but I am not sure how reproducible our build environment is, though most of it is in a container. Also cc @bergmark <https://github.com/bergmark>.
I don't think we can use the Stackage server to run your tests (we had a similar request a while back, which we couldn't fulfil), but we can try to help with questions/issues that arise.
(This is my personal take, other curators can also chime in if they have something to add.)
—
Reply to this email directly, view it on GitHub <#107 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG7BSU7ULUMZORRFRSA5I3XBCUCFANCNFSM6AAAAAASGJ4UJI>.
You are receiving this because you were mentioned.
|
Thanks @juhp and @cdornan. In such case I imagine this proposal awaits an enthusiatic volunteer to setup a clone of Stackage build server and run Stackage tests with a patched GHC. On constrast to our usual practices, just building |
I'm afraid we are stuck here, unless there is an enthusiast to run tests for all Stackage packages. We might have a better luck with finding such individual, if there was an MR at hand. @treeowl could you possibly prepare one? I strongly believe that this is an important issue, it would be a shame to drop it. |
@treeowl is there is no progress within two weeks, I'll close this as abandoned. We can return back anytime there are resources to execute. |
Closing as abandoned, feel free to reopen when there are resources to make some progress. |
Can the proposal please be reopened? I want to take over this proposal since the original author apparently let it go. I've prepared MR with the changes in base (https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12824) and an impact assessment of building Stackage snapshot and running its tests. |
So let me provide the detailed update about removing Merge requestFirst, a merge request to the GHC repository was prepared at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12824, initially with the suggestions from the first message in this issue. Later, during review by @clyring a bit more changes were proposed, notably removing The overall idea was to have the same laziness as Testing methodologyThen was the question of testing, which from the beginning was focusing on building full of Stackage and running its tests. Having failed to reuse original Stackage’s tooling, I’m still confident I managed to honor the spirit of the intended testing and in some respects actually run more tests that the original tooling would have. So the testing methodology ultimately boils down to downloading Stackage snapshot in the form of For doing the runs I developed following tool https://github.com/sergv/stackage-tester which does what I outlined in the previous paragraph. The builds are actually performed in the docker image that Stackage is itself built with, e.g. To get a bit more detailed about the build process, one important note is that this is not an original stackage tooling so it doesn’t pay attention to Stackage’s build-constraints.yaml that specifies which tests should and should not be run. My tool runs all the tests with the expectation that there should be no material differences between test runs, e.g. no tests started failing with the change of GHC. The builds of each package are carried in such a way that only the packages and versions from the specified snapshot will be used. There we subtle points regarding Results of test runI have built There are 4 folders, storing relevant logs for successful builds in The Please note that some build failures are expected - some Stackage packages build the library but disable tests completely so Stackage doesn’t even try building them. When my tool builds everything and tests don’t build with current snapshot then it will be recorded as a build failure since the point of the exercise is to run tests. Failing tests are also expected since Stackage disables tests for some packages based on author’s wishes. Surprisingly, tests for aeson are disabled on Stackage so had I used the original Stackage tooling they wouldn’t be covered in this report. What to look for in the testsPlease feel free to double check build and test results, I could have missed something. Overall run is summarised by
There are around 230 total packages were either did not build or whose tests failed during the run. What I was primarily looking for to find breakages caused by the removal of
Explaining why there’s 234 failures with vanilla GHC but only 233 with the patched oneThere’s less failures so it should be OK. On a more serious note the changes seem to be in intermittent failures causes of which are not known. With patched GHC the tests for Explaining the two failures I’ve chosen to care about and ignoring all the other noise
Notably some packages were failing but became working with the patched GHC
Overall I did not expect to fix any tests with the removal of Testing conclusionLooks like Stackage tests don’t really depend on laziness in I may have missed something in this sea of diffs but it seems to be safe to stsate that out of 3109 packages no testsuite started failing when compiled with GHC that included the MR that implements this proposal’s change. |
Thanks for the analysis and extensive testing, @sergv. Dear CLC members, I'd like to open the vote soon. Any more opinions / comments / suggestions / considerations? See #107 (comment) for a kind of executive summary. |
This is going to be ... hard. Some decisions, I think, will not be very controversial. Others will likely be quite controversial. Let's go through them one by one and try to figure things out. But first, I'd like to mention that along with the definition we have,
there's another, equally valid, expression of the concept of a nonempty list:
Each of these expressions is better at certain things and worse at certain things. Personally, I find the
NonEmpty'
expression more natural or fundamental, and therefore I will tend towards implementations that reflect the "natural" strictness we'd find there. However, I don't want to push for that where it feels unnatural.unfold
This function was deprecated ages and ages ago, and the time has long since come to delete it. There's no point discussing its strictness.
uncons
Currently,
I propose
What I actually want, based on my stated bias, is
Would that be a step too far? If so, would it be worth offering such a function by another name?
init
andlast
These have irrefutable patterns, but they're actually strict. Confusing, but just an implementation issue we don't need to discuss.
<|
andcons
This is defined
I believe this is the correct amount of laziness and we should leave it as is.
toList
Currently,
I propose
lift
Currently,
I don't have much intuition about how this function should behave. If we change
toList
, then it will automatically get stricter when applied toNonEmpty
s; is that okay?map
Currently,
I propose to remove the irrefutable pattern.
inits
,inits1
,tails
, andtails1
I have yet to form any opinion on these. The change in
toList
behavior affects them too.insert
Currently,
I think the proposed change in
toList
behavior is fine for this. It might make a difference for some sort of degenerateOrd
instance, but I don't imagine we'll get any complaints.scanl
,scanr
,scanl1
,scanr1
Currently, these are lazy; I propose to make them strict.
intersperse
Currently,
I'd definitely remove the irrefutable pattern. My bias would suggest forcing
bs
as well, but I doubt that's what people will actually want.reverse
Currently,
reverse
is actually strict, if I read it right. I'm fine with leaving it that way.take
Currently,
The proposed change to
toList
would make this strict, which I think would be better.drop
Currently,
which is lazy when
n <= 0
and strict otherwise. The proposed change totoList
would make it unconditionally strict, which I think would be better.splitAt
Currently,
This is odd the same way
drop
is. The proposed change totoList
will make it strict.takeWhile
,dropWhile
There's a pattern here; I think these are better with stricter
toList
.zip
,zipWith
Currently,
I would remove the irrefutable patterns.
unzip
Currently,
I propose changing this to
Again, my bias would suggest forcing
abs
, but I don't think that's what people will want.transpose
No clear opinion.
append
See
<>
below.appendList
This is strict, and I think should remain so.
prependList
This is strict, and I think should remain so.
Foldable
instanceWe have
I propose to remove all the irrefutable patterns.
Functor
instanceCurrently,
I propose to remove the irrefutable patterns.
Traversable
instanceCurrently,
I propose to remove the irrefutable pattern.
Semigroup
instanceWe currently have
This looks correct to me.
The text was updated successfully, but these errors were encountered: