Support error recovery in `check` for invalid parse trees #4689

dwblaikie · 2024-12-16T21:44:01Z

Description of the bug:

No response

What did you do, or what's a simple way to reproduce the bug?

base class Foo { }
class Base {
  extend base Foo;
}

$ carbon compile a.carbon

What did you expect to happen?

In this particular case, it could've recovered with an assumed :, but I'd also (or perhaps initially/first) like to fix the more general case, where a recovery might not be feasible/correct/helpful.

What actually happened?

CHECK-fail/crash:

FATAL failure at toolchain/parse/parse.cpp:61: Invalid tree returned by Parse(): NodeId #node14 couldn't be extracted as a ClassDefinition. Trace:
Aggregate N6Carbon5Parse15ClassDefinitionE: begin
Vector: begin
NodeIdInCategory Decl: kind InvalidParseSubtree consumed
NodeIdInCategory Decl: kind InvalidParse consumed
NodeIdInCategory Decl error: kind ExtendModifier doesn't match
Vector: end
NodeIdForKind error: wrong kind ExtendModifier, expected ClassDefinitionStart
Aggregate N6Carbon5Parse15ClassDefinitionE: error
Error: ExtendModifier node left unconsumed.

Any other information, logs, or outputs that you want to share?

I know we (myself, @zygoloid and @jonmeow ) have had some conversations about this, but some was in person/un-recorded, or the records are hard for me to find/synthesize.

Currently, the failure in

carbon-lang/toolchain/parse/handle_base.cpp

Lines 14 to 25 in f922988

    
           if (!context.ConsumeAndAddLeafNodeIf(Lex::TokenKind::Colon, 
        
                                                NodeKind::BaseColon)) { 
        
             // TODO: If the next token isn't a colon or `class`, try to recover 
        
             // based on whether we're in a class, whether we have an `extend` 
        
             // modifier, and the following tokens. 
        
             CARBON_DIAGNOSTIC(ExpectedAfterBase, Error, 
        
                               "`class` or `:` expected after `base`"); 
        
             context.emitter().Emit(*context.position(), ExpectedAfterBase); 
        
             context.RecoverFromDeclError(state, NodeKind::BaseDecl, 
        
                                          /*skip_past_likely_end=*/true); 
        
             return; 
        
           }

constructs an invalid parse tree that looks like this:

  {kind: 'BaseIntroducer', text: 'base'},
  {kind: 'ExtendModifier', text: 'extend'},
{kind: 'BaseDecl', text: ';', has_error: yes, subtree_size: 3},

Specifically, this does not match the "shape" a BaseDecl expects, which is this:

  {kind: 'BaseIntroducer', text: 'base'},
  {kind: 'ExtendModifier', text: 'extend'},
  {kind: 'BaseColon', text: ':'},
  {kind: 'IdentifierNameExpr', text: 'Foo'},
{kind: 'BaseDecl', text: ';', subtree_size: 5},

And so the above crash/check-fail occurs.

I know we had talked about different representations for the failure and recovery.

But I'd like to focus on the case where recovery isn't feasible.

What should we do in that case? (eg: if the name (Foo) is missing entirely)

I thought we could, in check, skip any error nodes in Check's

carbon-lang/toolchain/check/check_unit.cpp

Lines 345 to 347 in f922988

    
           while (auto maybe_node_id = traversal.Next()) { 
        
             node_id = *maybe_node_id; 
        
             auto parse_kind = context_.parse_tree().node_kind(node_id);

  while (auto maybe_node_id = traversal.Next()) {
    node_id = *maybe_node_id;
    auto parse_kind = context.parse_tree().node_kind(node_id);
    if (context.parse_tree().node_has_error(node_id)) {
      return false;
    }

But judging by the code in https://github.com/carbon-language/carbon-lang/blob/f922988c8ccaaf68e703e3126a4a26709efb460f/toolchain/check/handle_noop.cpp that's not the intent (the intent seems to be that we do walk through even the error nodes) - but perhaps the proposed skip ^ is the way to address these TODOs? (or perhaps some more general "we shouldn't check a parse tree with any invalid nodes at all" might be even more general/failing earlier?)

After this discussion, I wouldn't mind discussing how to improve the error recovery - I know we talked about some kind of "synthetic token" we could use to put the colon into the parse tree?

The text was updated successfully, but these errors were encountered:

jonmeow · 2024-12-16T22:15:59Z

Stepping back a moment, we currently expect that (a) parse can create (subtly) invalid parse trees, and (b) check will crash on these. We do not fuzz test invalid parse trees as a consequence. So broadly speaking, I think it's fair to say crashes like this are probably common. Similarly, I think we should be cautious about trying to chase down all the issues like this unless we're ready to really commit to error recovery -- my leaning has been that we benefit from prioritizing language features first.

Regarding just bumping out, my expectation has been that we'd try to check invalid parse trees as best as possible. At a minimum, we should be okay checking a parse tree until the first invalid node. However, you might have something like:

class Foo {
  fn
}

class Bar {
  fn Foo() -> i32 { return 1.0; }
}

My own expectation is that this should print a diagnostic for the invalid fn declaration, and also a diagnostic for 1.0 not being i32. So I think that means trying to recover in check.

In this specific case, I think the right choice is to recover in parse. For now, I believe that's creating BaseColon with has_error = true. Maybe in the future we should designate that better with some kind of synthetic token, but if you want a fix right now, I think that's what it should look like.

Short-term, we could exit with a TODO in ProcessNodeIds when node_has_error.

dwblaikie · 2024-12-17T22:07:44Z

Thanks for the discussion/details - I'll work on the short-term fix for now, after that don't mind leaving this bug open or closing for now since we aren't planning to work on this feature (check recovery from parse failures) at this time.

jonmeow · 2024-12-17T22:10:28Z

Maybe re-title the issue to reflect the broader scope of it, or I could create a new issue that captures the high-level problem/plan?

dwblaikie · 2024-12-17T22:25:50Z

Retitling sounds good - I could edit the main post at the top to discuss this/focus on this broader issue?

Short term solution/block for #4689

dwblaikie added the toolchain label Dec 16, 2024

dwblaikie mentioned this issue Dec 17, 2024

Abort checking when encountering an invalid parse node #4700

Merged

dwblaikie changed the title ~~Crash in check on missing colon in extend base~~ Support error recovery in check for invalid parse trees Dec 17, 2024

github-merge-queue bot pushed a commit that referenced this issue Dec 18, 2024

Abort checking when encountering an invalid parse node (#4700)

4d0a6db

Short term solution/block for #4689

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support error recovery in `check` for invalid parse trees #4689

Support error recovery in `check` for invalid parse trees #4689

dwblaikie commented Dec 16, 2024

jonmeow commented Dec 16, 2024 •

edited

Loading

dwblaikie commented Dec 17, 2024

jonmeow commented Dec 17, 2024

dwblaikie commented Dec 17, 2024

Support error recovery in check for invalid parse trees #4689

Support error recovery in check for invalid parse trees #4689

Comments

dwblaikie commented Dec 16, 2024

Description of the bug:

What did you do, or what's a simple way to reproduce the bug?

What did you expect to happen?

What actually happened?

Any other information, logs, or outputs that you want to share?

jonmeow commented Dec 16, 2024 • edited Loading

dwblaikie commented Dec 17, 2024

jonmeow commented Dec 17, 2024

dwblaikie commented Dec 17, 2024

Support error recovery in `check` for invalid parse trees #4689

Support error recovery in `check` for invalid parse trees #4689

jonmeow commented Dec 16, 2024 •

edited

Loading