-
Notifications
You must be signed in to change notification settings - Fork 515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"The parser" #13
Comments
I'd be interested in helping out with the parser section. I haven't made any contributions to the Rust parser specifically, but I've played around with Do you know if there's anyone with experience in the parser that I'd be able to bounce questions off? |
@Michael-F-Bryan you can certainly bounce questions off of me; I would say the main owner of the parser is @petrochenkov |
Copying some stuff I wrote in #21 that seems relevant:
Hmm, good question. I imagine it'd be good to look at a few PRs that modified the parser to see what they had to do. Actually, it might be worth just including a link to those PRs in the guide to help people get oriented. rust-lang/rust#45047 is such a PR, and here is a link to some of the relevant content. Some things immediately jump out as worth explaining:
|
I'm surprised how familiar a lot of this is! Quite a lot of parsers for toy languages I've written in Rust are effectively a state machine which incrementally consumes tokens ( I'm interested to find out how |
Yeah, it's your basic recursive descent parser. Nothing too fancy. =)
This is somewhat ad-hoc, but actually one of the things I would like documented. @nrc is the one who put in a lot of that, and they may be able to comment a bit, though I think @estebank has also put in some effort here. One particular pattern I recall that @estebank has used a few times is that they "checkpoint" the state of the parser and run "tentative" parses that are then discarded -- this is used to give better error messages, basically saying something like "it looks you were writing a fn here, you need to add this keyword". In short, though, we make a 'best effort' to recover and build up some sort of tree. We wind up reporting an error but then passing this tree back up to the rest of the code, which continues as usual. I'm not sure if there are any kind of "error nodes" in the AST, I didn't see any in a quick search, but I would expect that to appear -- we certainly use that later on. e.g. the HIR (constructed from the AST, not directly from the parser) has a node TyErr that represents an erroneous type. |
I'm moving across a conversation I was having with @mark-i-m about how much we want to explain parser basics.
Sounds like a good idea. I was thinking we could say something like this:
For people who already know how parsers work it's essentially recursive descent 101, but something like that is probably complete enough to give people the general idea, but short enough to not be off topic. You could then tie all of this back to |
Copying over Niko's comment from #26 :
|
Triaged: not a whole lot of change. There was some discussion earlier with @chrissimpkins, @matklad and @Centril (who is taking some time off atm). Perhaps Chris can give an update? |
Definitely still planned. I am digging into the parser source and just submitted my first parser-area PR this week. :) Once I understand the source well enough I will get started on the chapter if someone else does not get to it first. The conversations that I held with Aleksey and Mazdak are in our Zulip channel for anyone interested in this information. They informed the rustc-dev-guide Overview section and there should be plenty of information in there to get a start on the main lexer/parser chapter. |
It looks like the conversation with Aleksey was an impromptu discussion on Apr 7, 2020 that occurred in a private thread. I copied the text below. DetailsHi Aleksey! Our WG is working on a rustc Overview document for the rustc-dev-guide and I am interested in updating the lexer documentation. Do you happen to have some time in the next week that we could use to get together for 10 mins or so to discuss the lexer? matklad12:33 PM 12:33 PM Chris Simpkins12:34 PM 12:34 PM 12:35 PM 12:36 PM 12:36 PM 12:38 PM matklad12:39 PM 12:39 PM 12:40 PM 12:40 PM 12:41 PM Chris Simpkins12:41 PM matklad12:42 PM 12:44 PM 12:44 PM 12:45 PM 12:45 PM 12:46 PM Chris Simpkins12:46 PM 12:47 PM matklad12:50 PM 12:50 PM Chris Simpkins12:50 PM 12:52 PM 12:52 PM matklad12:52 PM Chris Simpkins12:53 PM matklad12:54 PM Chris Simpkins12:54 PM 12:55 PM matklad12:57 PM Chris Simpkins12:59 PM |
compiler-team.md translated
It'd be worth covering
eat
function?last_span
thing that stores the span of the last token, and many things follow the idiom of saving the "lo" point of the span, parsing some stuff, then extracting the "hi" point and combining them. This would be used to make a span that encompasses, for example, an entiretrait
definition (the lo point would come from thetrait
keyword, but the end point comes after having parsed a bunch of other things).The text was updated successfully, but these errors were encountered: