-
Notifications
You must be signed in to change notification settings - Fork 30
Description
The test corpus contained a case where function is called with many (over 255) arguments, each just numbers. This used to be a syntax error in 3.6 and before -- but with more recent versions of the bytecode compiler it works for 10s of 1000s of arguments.
However, the generated parser in Python cannot handle it! It raises a RecursionError for A(0, 1, ..., k)
if k is 221 or higher. The traceback seems to be an endless repetition of this fragment:
File "/Users/guido/pegen/data/python_parser.py", line 5512, in _tmp_108
if (literal := self.expect(",")) and (c := self.args()):
File "/Users/guido/pegen/pegen/parser.py", line 65, in memoize_wrapper
tree = method(self, *args)
File "/Users/guido/pegen/data/python_parser.py", line 3041, in args
if (a := self.named_expression()) and (b := self._tmp_108(),):
File "/Users/guido/pegen/pegen/parser.py", line 65, in memoize_wrapper
tree = method(self, *args)
I think this is actually about this rule:
args[expr_ty]:
[...]
| a=named_expression b=[',' c=args { c }] {
[...] }
The part [',' c=args {c}]
corresponds to _tmp_108
. This is a straightforward recursion (not left-).
I can't actually fathom that the C parser doesn't segfault on this for me; it just slows down. (In the C code, it's _tmp_109
, but the structure is the same.) The stack must automatically grow (on Mac, at least). But even there it would be nice if we could replace this by a simple loop.
E.g. maybe we could make this work?
args[expr_ty]:
| a=starred_expression b=[',' c=args { c }] {
_Py_Call(_PyPegen_dummy_name(p),
(b) ? CHECK(_PyPegen_seq_insert_in_front(p, a, ((expr_ty) b)->v.Call.args))
: CHECK(_PyPegen_singleton_seq(p, a)),
(b) ? ((expr_ty) b)->v.Call.keywords : NULL,
EXTRA) }
| a=kwargs { _Py_Call(_PyPegen_dummy_name(p),
CHECK_NULL_ALLOWED(_PyPegen_seq_extract_starred_exprs(p, a)),
CHECK_NULL_ALLOWED(_PyPegen_seq_delete_starred_exprs(p, a)),
EXTRA) }
| ','.(named_expression !'=')+ { XXX }
I haven't figured out what should go into XXX yet, but it's probably going to be more efficient, since we don't build up the array of arguments by prepending one at a time.