Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support inlining of non-terminals, as Menhir does #148

Open
sgraf812 opened this issue Sep 23, 2019 · 9 comments
Open

Support inlining of non-terminals, as Menhir does #148

sgraf812 opened this issue Sep 23, 2019 · 9 comments

Comments

@sgraf812
Copy link
Collaborator

sgraf812 commented Sep 23, 2019

It has bothered me for a long time that we have so many conflicts in GHC's parser, and https://gitlab.haskell.org/ghc/ghc/issues/17232 is there to track it. Precedence declarations will probably go a long way to ameliorate this.

But as section 5.3 in http://gallium.inria.fr/~fpottier/menhir/manual.pdf points out, there still are conflicts which aren't easily resolved without compromising on software design by inlining all productions of a non-terminal into a call site.

I imagine something like Menhir's %inline keyword would help here. But I'd rather not have it at the declaration site of a non-terminal. Instead, we should have a mechanism that allows us to annotate use-site of the non-terminal, similar to the difference between {-# INLINE #-} and the GHC.Magic.inline. Also it would help to resolve conflicts on a case-by-case basis rather than just blindly inline the non-terminal everywhere, which might come with exponential blow-up.

@sgraf812
Copy link
Collaborator Author

For a live example where this is useful, consider https://gitlab.haskell.org/ghc/ghc/blob/68ddb43c44065d0d3a8a6893f7f8e87f15ee9c1e/compiler/parser/Parser.y#L1278.

We can't use opt_family and opt_instance here, because it would give rise to the following parser situation:

-- type family declarations, with optional 'family' keyword
at_decl_cls -> type . opt_family type opt_at_kind_inj_sig
-- default type instances, with optional 'instance' keyword
at_decl_cls -> type . opt_instance ty_fam_inst_eqn

Reduce an empty opt_instance and you commit to the second production, reduce an empty opt_family and you reduce to the first one. By attaching some kind of inline information for the parser, the conflict would be resolved without having to do the inlining by hand, as it is currently done.

@harpocrates
Copy link
Contributor

I want this a lot too for a project of mine whose grammar is quite large, and so I went ahead and took a stab at implementing it. Here are the changes.

  • %inline is use-site only (see my change to Calc.ly for a complete example):

    some_rule
        : foo bar %inline baz qux    { ... }     -- `baz` gets inlined here
    
  • The current idea is to implement %inline as a phase to be desugared (similar to parametrized productions).

With the caveat that I've only tried toy examples and done nothing to make the code efficient, it seems to be working as expected. Before I spend more time on this:

  • @simonmar is this a feature that you think would be worthy of inclusion in Happy (provided a good implementation)?

  • When inlining, should we duplicate the RHS code of the inlined production at the inline-sites? Should we abstract it out into some other top-level function? I implemented the former for now.

  • More generally, does the approach I took seem reasonable?

  • Can we bikeshed the syntax? I'm perfectly happy with only callsite inlining, but we could have a top-level %inline rule to indicate rule should always be inlined.

@cartazio
Copy link

cartazio commented May 8, 2020

bump, whats the current stances?

@harpocrates
Copy link
Contributor

Wait for feedback on design and merge-ability I think.

@simonmar
Copy link
Member

Happy to have this feature, it does sound useful.

Perhaps %inline(nt) would be a better syntax?

We probably shouldn't duplicate the code, how hard is it to abstract it? Do we run into typing issues?

I haven't reviewed the code, but happy to give it a read once it's settled down. I have no strong attachment to the internals of Happy, it needs a complete rewrite really.

@googleson78
Copy link

This would be useful in the implementation of modifiers, see https://gitlab.haskell.org/ghc/ghc/-/issues/22624#note_471850

@sgraf812
Copy link
Collaborator Author

sgraf812 commented Sep 13, 2024

I think with the introduction of %shift (or was it always there?) and parameterised productions, GHC's parser finally has become conflict-free, so this issue has been far less pressing.

Edit: The following is actually incorrect.
In particular, 0-ary parameterised productions are like the declaration-site %inline pragma. The OP is about use-site inlining in order to avoid code blowup, so I'm leaving this issue open.

@int-index
Copy link
Collaborator

In particular, 0-ary parameterised productions are like the declaration-site %inline pragma.

Are you sure about this? My mental model for parameterised productions was that they were expanded to top-level copies of the template, e.g.

e(t) : t t { ... }

r : e(x) { ... }
  | e(y) { ... }

is equivalent to

e_x : x x { ... }
e_y : y y { ... }

r : e_x { ... }
  | e_y { ... }

I don't see where inlining enters the picture.

@sgraf812
Copy link
Collaborator Author

sgraf812 commented Sep 13, 2024

Are you sure about this?

Indeed not, I was wrong. Marking a (use or decl) of a non-terminal as %inline duplicates its productions into its use sites. That is not the same as a parameterised production at all, although it means the duplicate productions will have a more precise lookahead set, which might absolve of some reduce/reduce conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants