Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple objects in one text file #72

Open
AngledLuffa opened this issue Jun 20, 2021 · 1 comment
Open

Multiple objects in one text file #72

AngledLuffa opened this issue Jun 20, 2021 · 1 comment

Comments

@AngledLuffa
Copy link

I've been building a parser where the goal is to read multiple objects from the same file (trees from a Penn Treebank style parser dataset, in case that helps). Currently my top level rule handles one tree, with the expectation that there will be multiple trees in the same file. Is there a simple solution for this? I've tried treating the parser as a generator, and that doesn't seem to work, and if I just pass the file with multiple trees in it, the parser complains about a syntax error when it starts reading the second tree.

So far the best idea I have is to make a list of objects, but it seems like there must be a better way.

So, instead of this:

class TreeParser(Parser):
    tokens = TreeLexer.tokens

    # the extra layer of productions at the top is so that we can handle trees such as
    #  ((tree stuff))
    # by adding a ROOT node at the very top
    @_('LPAREN factor RPAREN')
    def root(self, p):
        return Tree(label="ROOT", children=[p.factor])

    @_('factor')
    def root(self, p):
        return p.factor

Now I have the following productions at the top instead (the list is backwards, but I'll work around that)

    # TODO: hopefully there's some other way to parse multiple trees from one file
    @_('root treelist')
    def treelist(self, p):
      trees = p.treelist
      trees.append(p.root)
      return trees

    @_('root')
    def treelist(self, p):
        return [p.root]

I can actually link the full parser if that helps. It's pretty simple.

Thanks!

@AngledLuffa
Copy link
Author

Addendum: without having looked at the source, is there an efficiency difference between root treelist and treelist root? Are they both linear runtime, hopefully?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant