Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: initsNE and tailsNE. #558

Merged
merged 14 commits into from
Feb 20, 2024
Merged

Conversation

kindaro
Copy link
Contributor

@kindaro kindaro commented Feb 7, 2024

Resolve #557.

@kindaro
Copy link
Contributor Author

kindaro commented Feb 7, 2024

remaining tasks

  • add since metadata
  • talk about performance of mutually recursive pairs
  • talk about inlining
  • rebase

@kindaro
Copy link
Contributor Author

kindaro commented Feb 7, 2024

@Bodigrim   Does this look somewhat like what you had in mind?

@kindaro
Copy link
Contributor Author

kindaro commented Feb 7, 2024

I have some questions regarding performance.

Note that there are no benchmarks for inits and tails in master right now. I tried to add them but they eat a lot of memory and did not terminate in 10 minutes. So we are flying blind here.

  • If we define inits in terms of initsNE, should we add the inline pragma to inits to get rid of this indirection? I am a cargo cultist when it comes to inlining, I need a consultation of an ordained priest.

  • If tails and tailsNE are mutually recursive (as they are now), would it not mean that lists and non-empty lists get converted into one another on each iteration? It seems we should not define tails in terms of tailsNE, as in the bytestring patch Add NonEmpty variants of inits and tails bytestring#557, but rather keep the definition of tails as it was, so that tailsNE is not even recursive.

@Bodigrim
Copy link
Contributor

Bodigrim commented Feb 9, 2024

Thanks!

  • If we define inits in terms of initsNE, should we add the inline pragma to inits to get rid of this indirection? I am a cargo cultist when it comes to inlining, I need a consultation of an ordained priest.

Historically text was keen to put INLINE everywhere. This helps to game microbenchmarks, but bad for code bloat and compilation time, so nowadays I'm reluctant to add it. It's too blunt of a hammer.

I think it's fine as is. Maybe add {-# INLINABLE #-} and see if it changes anything.

  • If tails and tailsNE are mutually recursive (as they are now), would it not mean that lists and non-empty lists get converted into one another on each iteration?

I suspect that tailsNE gets inlined into tails and :| gets replaced by :, so there is no ping-pong, but I have not checked Core. @clyring any recollection about it?

src/Data/Text.hs Outdated Show resolved Hide resolved
@clyring
Copy link
Member

clyring commented Feb 9, 2024

  • If tails and tailsNE are mutually recursive (as they are now), would it not mean that lists and non-empty lists get converted into one another on each iteration?

I suspect that tailsNE gets inlined into tails and :| gets replaced by :, so there is no ping-pong, but I have not checked Core. @clyring any recollection about it?

If memory serves, tailsNE gets a worker/wrapper split and when its wrapper inlines into tails the intermediate (:|) constructor is easily optimized away as long as an appropriately strict nonEmptyListToList conversion is used. (The Data.List.NonEmpty.toList provided by base is too lazy for this purpose... and so is Data.List.NonEmpty.cons and so on...) This optimization should be very reliable, but manually writing the logic in a function with type Text -> (# Text, [Text] #) or similar is another option, if you prefer to be explicit about it but don't want the logic duplicated or partial-ified.

@kindaro
Copy link
Contributor Author

kindaro commented Feb 9, 2024

I wish we had a way to check performance, but changing the design of benchmarks is out of scope of this issue.

  • Some workflows failed because singleton was not exported from base until GHC 9. I removed the use of singleton so that the code builds with GHC 8.10.7. @Bodigrim would it be affordable to run the workflows again?
  • As Matthew says (or, well, as I understand him), we need to make sure the outermost constructor of the output of initsNE in the definition of inits is forced, and likewise for tails. I accomplish this by writing in some $! operators. @clyring does this look right to you?

Another question: should I add since metadata to documentation? What version should I specify? Right now it seems the version is 2.1. It seems to me 2.1.1 would be fitting.

@Bodigrim
Copy link
Contributor

@Bodigrim would it be affordable to run the workflows again?

Sure, sorry for being slow with it.

Another question: should I add since metadata to documentation? What version should I specify? Right now it seems the version is 2.1. It seems to me 2.1.1 would be fitting.

We are going to release 2.1.1 very soon (ideally tomorrow), so let's put 2.1.2.

@kindaro
Copy link
Contributor Author

kindaro commented Feb 15, 2024

Turns out some workflows cannot handle Unicode in the standard output of the test suite, so I replaced the Unicode ellipsis with three ASCII dots. All workflows should pass now. Also, I added since metadata to each of the 4 new functions.

@Bodigrim Let me know if this looks good, so that I can rebase and rewrite the commit history into something nice.

src/Data/Text/Lazy.hs Outdated Show resolved Hide resolved
@Lysxia
Copy link
Contributor

Lysxia commented Feb 15, 2024

Rather than finicky benchmarks, in this case it is easy enough to see that the optimized Core looks reasonable (no accidental thunks), by adding the following options at the top of the relevant files:

{-# OPTIONS_GHC -ddump-simpl -dsuppress-all -ddump-to-file #-}

Core of Data.Text

inits

initsNE
  = \ t_sdIN ->
      :|
        empty
        (case t_sdIN of { Text bx_dalv bx1_dalw bx2_dalx ->
         letrec {
           $wloop_sdIL
             = \ ww_sdII ->
                 case >=# ww_sdII bx2_dalx of {
                   __DEFAULT ->
                     case indexWord8Array# bx_dalv (+# bx1_dalw ww_sdII) of r#_ibpI
                     { __DEFAULT ->
                     let {
                       c#_sbIo
                         = word2Int# (clz8# (and# (not# (word8ToWord# r#_ibpI)) 255##)) } in
                     let { j_sbIq = +# ww_sdII (xorI# c#_sbIo (<=# c#_sbIo 0#)) } in
                     : (Text bx_dalv bx1_dalw j_sbIq) ($wloop_sdIL j_sbIq)
                     };
                   1# -> []
                 }; } in
         $wloop_sdIL 0#
         })

inits
  = \ x_X7 ->
      : empty
        (case x_X7 of { Text bx_dalv bx1_dalw bx2_dalx ->
         letrec {
           $wloop_sdIL
             = \ ww_sdII ->
                 case >=# ww_sdII bx2_dalx of {
                   __DEFAULT ->
                     case indexWord8Array# bx_dalv (+# bx1_dalw ww_sdII) of r#_ibpI
                     { __DEFAULT ->
                     let {
                       c#_sbIo
                         = word2Int# (clz8# (and# (not# (word8ToWord# r#_ibpI)) 255##)) } in
                     let { j_sbIq = +# ww_sdII (xorI# c#_sbIo (<=# c#_sbIo 0#)) } in
                     : (Text bx_dalv bx1_dalw j_sbIq) ($wloop_sdIL j_sbIq)
                     };
                   1# -> []
                 }; } in
         $wloop_sdIL 0#
         })

tails

Rec {
$wtailsNE
  = \ ww_sdp1 ww1_sdp2 ww2_sdp3 ->
      case <=# ww2_sdp3 0# of {
        __DEFAULT ->
          (# Text ww_sdp1 ww1_sdp2 ww2_sdp3,
             case indexWord8Array# ww_sdp1 ww1_sdp2 of r#_ib5C { __DEFAULT ->
             let {
               c#_ib5D
                 = word2Int# (clz8# (and# (not# (word8ToWord# r#_ib5C)) 255##)) } in
             let { y_ib5E = xorI# c#_ib5D (<=# c#_ib5D 0#) } in
             case $wtailsNE ww_sdp1 (+# ww1_sdp2 y_ib5E) (-# ww2_sdp3 y_ib5E) of
             { (# ww3_sdPk, ww4_sdPl #) ->
             : ww3_sdPk ww4_sdPl
             }
             } #);
        1# -> (# empty, [] #)
      }
end Rec }

tailsNE
  = \ t_sdoZ ->
      case t_sdoZ of { Text ww_sdp1 ww1_sdp2 ww2_sdp3 ->
      case $wtailsNE ww_sdp1 ww1_sdp2 ww2_sdp3 of
      { (# ww3_sdPk, ww4_sdPl #) ->
      :| ww3_sdPk ww4_sdPl
      }
      }

tails
  = \ x_iaLw ->
      case x_iaLw of { Text ww_sdp1 ww1_sdp2 ww2_sdp3 ->
      case $wtailsNE ww_sdp1 ww1_sdp2 ww2_sdp3 of
      { (# ww3_sdPk, ww4_sdPl #) ->
      : ww3_sdPk ww4_sdPl
      }
      }

L.drop does not inline because it is recursive.

Co-authored-by: Xia Li-yao <[email protected]>
@kindaro
Copy link
Contributor Author

kindaro commented Feb 15, 2024

Thank you Li-yao.

Rather than finicky benchmarks, in this case it is easy enough to see that the optimized Core looks reasonable (no accidental thunks), by adding the following options at the top of the relevant files:

Adding some flags is easy, it is understanding the output, written (or, rather, generated) in a language for which there is no documentation, that is challenging. For now I can only accept your observations on faith since I have no expertise to tell if your reasoning is fair. I committed the change to initsNE.inits' you suggested — it looks better in Haskell — but I cannot say if it will be more efficient, or less efficient, or without significant difference in efficiency.

By the way, you can also add -dsuppress-uniques so GHC will not needlessly obfuscate names.

I think next time I do something with text I should commit the baseline simplifier output and then see how it changes after the Haskell source is modified. This would be curious.

The simplifier output for Data.Text.Lazy.inits_inits' is the same between master and 0458012. So, I did not break anything — if something is broken, it was broken before I touched it.

@kindaro
Copy link
Contributor Author

kindaro commented Feb 15, 2024

@Bodigrim   Please let me know what the next step is to be. I thought rebasing and rewriting commit history would be good, but maybe this branch can simply be squashed into one commit and merged right from the GitHub graphical human interface. I am marking this pull request as ready for review so you can merge it if you feel like it.

@kindaro kindaro marked this pull request as ready for review February 15, 2024 15:19
Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Bodigrim Bodigrim requested a review from Lysxia February 15, 2024 20:35
src/Data/Text/Lazy.hs Outdated Show resolved Hide resolved
@Bodigrim
Copy link
Contributor

@Lysxia @meooow25 any more comments?

@meooow25
Copy link
Contributor

I haven't checked the core but otherwise seems okay to me 👍

@Bodigrim Bodigrim merged commit cdcbbd5 into haskell:master Feb 20, 2024
27 checks passed
@Bodigrim
Copy link
Contributor

Thanks @kindaro!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add initsNE, tailsNE :: Text -> NonEmpty Text
5 participants