Skip to content

optimized syntax '+' cause 'random_recursive_mutation' error #42

Open
@0x7Fancy

Description

@0x7Fancy
          going further, I found a way to mitigate;

based on the above issues, we create simpler test cases, test.json:

{
    "<entry>": [["I ", "<stmt1>", "like C++\n"]],
    "<stmt1>": [["<NODE>", "<stmt1>"], []],
    "<NODE>": [["very "]]
}

tanslate to test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | NODE stmt1
     ;
NODE : 'very '
     ;

and input 40960_very.txt:

I very very ...(*40956)... very very like C++

running with antlr4-parse:
Screen Shot 2024-01-08 at 17 56 39

from the perspective of antlr4, we can use the + syntax to describe test.g4, and ignore this prefix matching, as follows test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | (NODE)+
     ;
NODE : 'very '
     ;

running again with antlr4-parse:
Screen Shot 2024-01-08 at 17 59 40

so I made a patch to implement the above ideas, please refer to 0x7Fancy@6eae7d1;

I have only implemented the optimization of head recursion and tail recursion here, which is simple and easy to understand. for intermediate recursion, I think it can be rewritten as head/tail recursion in json

of course, this is just a mitigation measure. When the mutation generates a sufficiently complex syntax tree, it may still cause antlr4 to get stuck in syntax parsing.

Originally posted by @0x7Fancy in #17 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions