Skip to content

RecursionError in generated Python parser #245

@gvanrossum

Description

@gvanrossum

The test corpus contained a case where function is called with many (over 255) arguments, each just numbers. This used to be a syntax error in 3.6 and before -- but with more recent versions of the bytecode compiler it works for 10s of 1000s of arguments.

However, the generated parser in Python cannot handle it! It raises a RecursionError for A(0, 1, ..., k) if k is 221 or higher. The traceback seems to be an endless repetition of this fragment:

  File "/Users/guido/pegen/data/python_parser.py", line 5512, in _tmp_108
    if (literal := self.expect(",")) and (c := self.args()):
  File "/Users/guido/pegen/pegen/parser.py", line 65, in memoize_wrapper
    tree = method(self, *args)
  File "/Users/guido/pegen/data/python_parser.py", line 3041, in args
    if (a := self.named_expression()) and (b := self._tmp_108(),):
  File "/Users/guido/pegen/pegen/parser.py", line 65, in memoize_wrapper
    tree = method(self, *args)

I think this is actually about this rule:

args[expr_ty]:
    [...]
    | a=named_expression b=[',' c=args { c }] {
        [...] }

The part [',' c=args {c}] corresponds to _tmp_108. This is a straightforward recursion (not left-).

I can't actually fathom that the C parser doesn't segfault on this for me; it just slows down. (In the C code, it's _tmp_109, but the structure is the same.) The stack must automatically grow (on Mac, at least). But even there it would be nice if we could replace this by a simple loop.

E.g. maybe we could make this work?

args[expr_ty]:
    | a=starred_expression b=[',' c=args { c }] {
        _Py_Call(_PyPegen_dummy_name(p),
                 (b) ? CHECK(_PyPegen_seq_insert_in_front(p, a, ((expr_ty) b)->v.Call.args))
                     : CHECK(_PyPegen_singleton_seq(p, a)),
                 (b) ? ((expr_ty) b)->v.Call.keywords : NULL,
                 EXTRA) }
    | a=kwargs { _Py_Call(_PyPegen_dummy_name(p),
                          CHECK_NULL_ALLOWED(_PyPegen_seq_extract_starred_exprs(p, a)),
                          CHECK_NULL_ALLOWED(_PyPegen_seq_delete_starred_exprs(p, a)),
                          EXTRA) }
    | ','.(named_expression !'=')+ { XXX }

I haven't figured out what should go into XXX yet, but it's probably going to be more efficient, since we don't build up the array of arguments by prepending one at a time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions