Panic Mode Recovery at End of File

# Background 

Ideally I want to be able to parse out some specially formatted C++ comments and
the function which they are documenting. (Think a bespoke form of Doxygen).

After some reading it sounded a lot like using a Lexer/Parser had already solved
the hard part of this.

Possible problem is I'm trying to be lazy and ignore all the surrounding C++ code. 
So, outside of my golden comment blocks (and later the function being documented) 
there's a sea of syntax errors.

I was hoping I could easily pull out the interesting parts and ignore everything
else. I'm starting to think this might be outside intended operating conditions 
of such a parser though...

# Sly

I've been testing out Sly which I've proved will easily do what I want when there is
no unexpected text.

However, I can't quite seem to get the rather extreme error handling to do what I'd like.
Currently the problem appears to be when the unexpected text is between a valid
statement and the EOF.

Looking at the state debugfile, it looks like I need to get either a
COMMENT_OPEN or an $end to reduce what should be a complete expression on
the stack. However, I'm entering error() handling before hitting the end of the
file and I wonder if I need to be signaling this somehow? 

I've got some simplified test code below.

# Test Code
    #! /usr/bin/env python3
    
    from sly import Parser
    from sly import Lexer
    from pprint import pprint
    
    
    class CommentLexer(Lexer):
        tokens = {COMMENT_OPEN, COMMENT_CLOSE, WORD, SEMI}
    
        COMMENT_OPEN = r"/\* COMMENT:"
        COMMENT_CLOSE = r"\*/"
        WORD = r"[^; \*\t\n\r\f\v]+"
        SEMI = r";"
    
        ignore_astrix = r"\*"
        ignore_newline = r"\n"
        ignore_space = r" "
    
        def ignore_newline(self, t):
            self.lineno += t.value.count("\n")
    
        def error(self, t):
            print("Line %d: Bad character %r" % (self.lineno, t.value[0]))
            self.index += 1
    
    
    class CommentParser(Parser):
        tokens = CommentLexer.tokens
        debugfile = "comment_parser.out"
    
        def __init__(self):
            self.comments = []
    
        @_("comment_doc comment_doc")
        def comment_doc(self, p):
            pass
    
        @_("COMMENT_OPEN string COMMENT_CLOSE")
        def comment_doc(self, p):
            print("#########")
            print(f"Got: {p.string}")
            print("#########")
            self.comments.append(p.string)
            return p.string
    
        @_("string string")
        def string(self, p):
            return p[0] + " " + p[1]
    
        @_("WORD")
        def string(self, p):
            return p.WORD
    
        def error(self, p):
            pprint(p)
    
            if not p:
                print("Hit the end of the file!")
                return
    
            print(f"Syntax error at type: {p.type} value: {p.value} line: {p.lineno}")
            while True:
                tok = next(self.tokens, None)
    
                if tok == None:
                    print("Error Tok: Hit None")
                    return tok
    
                if tok.type == "COMMENT_OPEN":
                    print("Error Tok: Found new comment")
                    return tok
    
                print(f"Ignoring: {tok.type}")
    
    
    def test_one_comment_recovery_after():
        lexer = CommentLexer()
    
        test_data = """
        /* COMMENT: This is the
           only comment string I'd
           like to parse out
        */
    
        /* I don't care about this one. */
    
        """
    
        parser = CommentParser()
        parser.parse(lexer.tokenize(test_data))
        assert len(parser.comments) == 1
    
    
    def test_one_comment_recovery_before():
        lexer = CommentLexer()
    
        test_data = """
        /* I don't care about this one. */
    
        /* COMMENT: This is the
           only comment string I'd
           like to parse out
        */
    
        """
    
        parser = CommentParser()
        parser.parse(lexer.tokenize(test_data))
        assert len(parser.comments) == 1


    


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Panic Mode Recovery at End of File #38

Background

Sly

Test Code

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Panic Mode Recovery at End of File #38

Description

Background

Sly

Test Code

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions