Skip to content

Parser fails on every 2nd rule - understanding problem #113

@DannyBoyKN

Description

@DannyBoyKN

This repo is quite a bit old, not sure is still actively maintained and supported.
Hope to get some feedback of my simple test setup trying to parse C-like basic string data.

This is my test setup, called it myparser.py:

from sly import Parser
from sly import Lexer

class myLexer(Lexer):

    tokens = { FILE, ID,
                INCLUDE,
                TYPE, IS, INTEGER }

    # String containing ignored characters
    ignore = ' \t'
    literals = ['<','>','[',']',';','.','/',',']

    FILE= r'[\w./]+[\w/]+[.]txt'

    ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
    # Special cases
    ID['include'] = INCLUDE
    ID['type'] = TYPE
    ID['is'] = IS

    @_(r'0[xX][0-9a-fA-F]+',    # LiteralIntHex
       r'0[0-7]+',              # LiteralIntOct
       r'[1-9][0-9]+')          # LiteralIntDec
    def INTEGER(self, t):
        if t.value.startswith('0x'):
            t.value = int(t.value[2:], 16)
        elif t.value.startswith('0'):
            t.value = int(t.value[1:], 8)
        else:
            t.value = int(t.value)
        return t
    
    # Ignored pattern
    @_(r'/\*(.|\n)*?\*/',   # Comment (C-Style)
       r'//.*\n')           # Comment (C++-Style)
    def ignore_comments(self, t):
        self.lineno += t.value.count('\n')

    # Line number tracking
    @_(r'\n+')
    def newline(self, t):
        self.lineno = t.lineno + t.value.count('\n')
    
    def error(self, t):
        self.index += 1
        print(f"{self.lineno}: Illegal character at index {self.index} '{t.value[0]}'! STOP")
    
class myParser(Parser):
    tokens = myLexer.tokens

    @_("includedef")
    def entry(self, p):
        return p.includedef
    @_("typedef")
    def entry(self, p):
        return p.typedef


    @_("INCLUDE '<' FILE '>'")
    def includedef(self, p):
        print(p.INCLUDE, p.FILE)

    @_('TYPE ID IS ID "[" INTEGER "]" ";"')
    def typedef(self, p):
        print(p.TYPE, p.ID0, p.ID1, f"[{p.INTEGER}]")

if __name__ == '__main__':
    text = \
'''
/*============================================================================*/
include <../myfile.txt>

type mytype_t is uint32[40];
'''
    lexer = myLexer()
    parser = myParser()

    tok = lexer.tokenize(text)
    parser.parse(tok)

When I run it parsing the input text I get a failure for the type entry:

$ python myparser.py 
include ../myfile.txt
sly: Syntax error at line 5, token=TYPE

If I invert the order of the include and type lines:

type mytype_t is uint32[40];
include <../myfile.txt>

I get again the error on the 2nd line:

$ python myparser.py 
type mytype_t uint32 [40]
sly: Syntax error at line 4, token=INCLUDE

Couldn't figure out where is my failure 😞

The debug output seems to be ok for me:

Grammar:

Rule 0     S' -> entry
Rule 1     entry -> typedef
Rule 2     entry -> includedef
Rule 3     includedef -> INCLUDE < FILE >
Rule 4     typedef -> TYPE ID IS ID [ INTEGER ] ;

Terminals, with rules where they appear:

;                    : 4
<                    : 3
>                    : 3
FILE                 : 3
ID                   : 4 4
INCLUDE              : 3
INTEGER              : 4
IS                   : 4
TYPE                 : 4
[                    : 4
]                    : 4
error                : 

Nonterminals, with rules where they appear:

entry                : 0
includedef           : 2
typedef              : 1


state 0

    (0) S' -> . entry
    (1) entry -> . typedef
    (2) entry -> . includedef
    (4) typedef -> . TYPE ID IS ID [ INTEGER ] ;
    (3) includedef -> . INCLUDE < FILE >
    TYPE            shift and go to state 4
    INCLUDE         shift and go to state 5

    entry                          shift and go to state 1
    typedef                        shift and go to state 2
    includedef                     shift and go to state 3

state 1

    (0) S' -> entry .


state 2

    (1) entry -> typedef .
    $end            reduce using rule 1 (entry -> typedef .)


state 3

    (2) entry -> includedef .
    $end            reduce using rule 2 (entry -> includedef .)


state 4

    (4) typedef -> TYPE . ID IS ID [ INTEGER ] ;
    ID              shift and go to state 6


state 5

    (3) includedef -> INCLUDE . < FILE >
    <               shift and go to state 7


state 6

    (4) typedef -> TYPE ID . IS ID [ INTEGER ] ;
    IS              shift and go to state 8


state 7

    (3) includedef -> INCLUDE < . FILE >
    FILE            shift and go to state 9


state 8

    (4) typedef -> TYPE ID IS . ID [ INTEGER ] ;
    ID              shift and go to state 10


state 9

    (3) includedef -> INCLUDE < FILE . >
    >               shift and go to state 11


state 10

    (4) typedef -> TYPE ID IS ID . [ INTEGER ] ;
    [               shift and go to state 12


state 11

    (3) includedef -> INCLUDE < FILE > .
    $end            reduce using rule 3 (includedef -> INCLUDE < FILE > .)


state 12

    (4) typedef -> TYPE ID IS ID [ . INTEGER ] ;
    INTEGER         shift and go to state 13


state 13

    (4) typedef -> TYPE ID IS ID [ INTEGER . ] ;
    ]               shift and go to state 14


state 14

    (4) typedef -> TYPE ID IS ID [ INTEGER ] . ;
    ;               shift and go to state 15


state 15

    (4) typedef -> TYPE ID IS ID [ INTEGER ] ; .
    $end            reduce using rule 4 (typedef -> TYPE ID IS ID [ INTEGER ] ; .)

I would appreciate if anyone could tell me where's my error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions