You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ideally I want to be able to parse out some specially formatted C++ comments and
the function which they are documenting. (Think a bespoke form of Doxygen).
After some reading it sounded a lot like using a Lexer/Parser had already solved
the hard part of this.
Possible problem is I'm trying to be lazy and ignore all the surrounding C++ code.
So, outside of my golden comment blocks (and later the function being documented)
there's a sea of syntax errors.
I was hoping I could easily pull out the interesting parts and ignore everything
else. I'm starting to think this might be outside intended operating conditions
of such a parser though...
Sly
I've been testing out Sly which I've proved will easily do what I want when there is
no unexpected text.
However, I can't quite seem to get the rather extreme error handling to do what I'd like.
Currently the problem appears to be when the unexpected text is between a valid
statement and the EOF.
Looking at the state debugfile, it looks like I need to get either a
COMMENT_OPEN or an $end to reduce what should be a complete expression on
the stack. However, I'm entering error() handling before hitting the end of the
file and I wonder if I need to be signaling this somehow?
I've got some simplified test code below.
Test Code
#! /usr/bin/env python3
from sly import Parser
from sly import Lexer
from pprint import pprint
class CommentLexer(Lexer):
tokens = {COMMENT_OPEN, COMMENT_CLOSE, WORD, SEMI}
COMMENT_OPEN = r"/\* COMMENT:"
COMMENT_CLOSE = r"\*/"
WORD = r"[^; \*\t\n\r\f\v]+"
SEMI = r";"
ignore_astrix = r"\*"
ignore_newline = r"\n"
ignore_space = r" "
def ignore_newline(self, t):
self.lineno += t.value.count("\n")
def error(self, t):
print("Line %d: Bad character %r" % (self.lineno, t.value[0]))
self.index += 1
class CommentParser(Parser):
tokens = CommentLexer.tokens
debugfile = "comment_parser.out"
def __init__(self):
self.comments = []
@_("comment_doc comment_doc")
def comment_doc(self, p):
pass
@_("COMMENT_OPEN string COMMENT_CLOSE")
def comment_doc(self, p):
print("#########")
print(f"Got: {p.string}")
print("#########")
self.comments.append(p.string)
return p.string
@_("string string")
def string(self, p):
return p[0] + " " + p[1]
@_("WORD")
def string(self, p):
return p.WORD
def error(self, p):
pprint(p)
if not p:
print("Hit the end of the file!")
return
print(f"Syntax error at type: {p.type} value: {p.value} line: {p.lineno}")
while True:
tok = next(self.tokens, None)
if tok == None:
print("Error Tok: Hit None")
return tok
if tok.type == "COMMENT_OPEN":
print("Error Tok: Found new comment")
return tok
print(f"Ignoring: {tok.type}")
def test_one_comment_recovery_after():
lexer = CommentLexer()
test_data = """
/* COMMENT: This is the
only comment string I'd
like to parse out
*/
/* I don't care about this one. */
"""
parser = CommentParser()
parser.parse(lexer.tokenize(test_data))
assert len(parser.comments) == 1
def test_one_comment_recovery_before():
lexer = CommentLexer()
test_data = """
/* I don't care about this one. */
/* COMMENT: This is the
only comment string I'd
like to parse out
*/
"""
parser = CommentParser()
parser.parse(lexer.tokenize(test_data))
assert len(parser.comments) == 1
The text was updated successfully, but these errors were encountered:
Background
Ideally I want to be able to parse out some specially formatted C++ comments and
the function which they are documenting. (Think a bespoke form of Doxygen).
After some reading it sounded a lot like using a Lexer/Parser had already solved
the hard part of this.
Possible problem is I'm trying to be lazy and ignore all the surrounding C++ code.
So, outside of my golden comment blocks (and later the function being documented)
there's a sea of syntax errors.
I was hoping I could easily pull out the interesting parts and ignore everything
else. I'm starting to think this might be outside intended operating conditions
of such a parser though...
Sly
I've been testing out Sly which I've proved will easily do what I want when there is
no unexpected text.
However, I can't quite seem to get the rather extreme error handling to do what I'd like.
Currently the problem appears to be when the unexpected text is between a valid
statement and the EOF.
Looking at the state debugfile, it looks like I need to get either a
COMMENT_OPEN or an $end to reduce what should be a complete expression on
the stack. However, I'm entering error() handling before hitting the end of the
file and I wonder if I need to be signaling this somehow?
I've got some simplified test code below.
Test Code
The text was updated successfully, but these errors were encountered: