Description
going further, I found a way to mitigate;
based on the above issues, we create simpler test cases, test.json
:
{
"<entry>": [["I ", "<stmt1>", "like C++\n"]],
"<stmt1>": [["<NODE>", "<stmt1>"], []],
"<NODE>": [["very "]]
}
tanslate to test.g4
:
grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
;
stmt1:
| NODE stmt1
;
NODE : 'very '
;
and input 40960_very.txt
:
I very very ...(*40956)... very very like C++
from the perspective of antlr4, we can use the +
syntax to describe test.g4
, and ignore this prefix matching, as follows test.g4
:
grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
;
stmt1:
| (NODE)+
;
NODE : 'very '
;
running again with antlr4-parse
:
so I made a patch to implement the above ideas, please refer to 0x7Fancy@6eae7d1;
I have only implemented the optimization of head recursion and tail recursion here, which is simple and easy to understand. for intermediate recursion, I think it can be rewritten as head/tail recursion in json
of course, this is just a mitigation measure. When the mutation generates a sufficiently complex syntax tree, it may still cause antlr4 to get stuck in syntax parsing.
Originally posted by @0x7Fancy in #17 (comment)