Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimized syntax '+' cause 'random_recursive_mutation' error #42

Open
0x7Fancy opened this issue Jan 18, 2024 · 5 comments
Open

optimized syntax '+' cause 'random_recursive_mutation' error #42

0x7Fancy opened this issue Jan 18, 2024 · 5 comments

Comments

@0x7Fancy
Copy link
Contributor

          going further, I found a way to mitigate;

based on the above issues, we create simpler test cases, test.json:

{
    "<entry>": [["I ", "<stmt1>", "like C++\n"]],
    "<stmt1>": [["<NODE>", "<stmt1>"], []],
    "<NODE>": [["very "]]
}

tanslate to test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | NODE stmt1
     ;
NODE : 'very '
     ;

and input 40960_very.txt:

I very very ...(*40956)... very very like C++

running with antlr4-parse:
Screen Shot 2024-01-08 at 17 56 39

from the perspective of antlr4, we can use the + syntax to describe test.g4, and ignore this prefix matching, as follows test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | (NODE)+
     ;
NODE : 'very '
     ;

running again with antlr4-parse:
Screen Shot 2024-01-08 at 17 59 40

so I made a patch to implement the above ideas, please refer to 0x7Fancy@6eae7d1;

I have only implemented the optimization of head recursion and tail recursion here, which is simple and easy to understand. for intermediate recursion, I think it can be rewritten as head/tail recursion in json

of course, this is just a mitigation measure. When the mutation generates a sufficiently complex syntax tree, it may still cause antlr4 to get stuck in syntax parsing.

Originally posted by @0x7Fancy in #17 (comment)

@0x7Fancy
Copy link
Contributor Author

sorry, the commit 6eae7d1, this is a wrong patch, '+' syntax will cause 'random_recursive_mutation' to be invalid, please roll back to ff4e5a2;

for antlr4, the '+' syntax uses breadth expansion, but 'random_recursive_mutation' relies on depth expansion (tree structure) operation; in actual operation, 'random_recursive_mutation' always returns that the recursive node cannot be found, so each round random_recursive_mutation will produce the same content as the original data.

in #17 issue, I encountered a problem. An error occurred when minimizing the poc, which caused the problem. finally, I submitted the patch by mistake.

Back to the original question #17, I gave the reason in this reply #17 (comment), there are several places in the grammar that break the rules of LL(1)/LL(*), (non-terminal symbols can be deduced from ε), which makes antlr4 very easy to enter a backtracking loop.

for my original question, my test case is:

grammar test_multi;
entry: stmt1 '\n' EOF
     ;
stmt1: 'I ' stmt2 'like C++'
     ;
stmt2: NODE
     | NODE NODE
     | NODE NODE NODE
     | NODE NODE NODE NODE
     | NODE NODE NODE NODE NODE
     | NODE stmt2
     ;
NODE : 'very '
     ;

similarly, I broke the LL(1)/LL(*) rules (the candidate values of non-terminal symbols (first(α)) intersect), which also caused antlr4 to fall into grammar parsing backtracking.

I fixed the syntax file as follows:

grammar test_multi_patch;
entry: stmt1 '\n' EOF
     ;
stmt1: 'I ' stmt2 'like C++'
     ;
stmt2: 
     | NODE stmt2
     ;
NODE : 'very '
     ;

testing it:
Screen Shot 2024-01-18 at 12 38 02

@vanhauser-thc
Copy link
Member

so is a fix to the grammar mutator needed?

@0x7Fancy
Copy link
Contributor Author

nope, the commit ff4e5a2 is perfect, Grammar-Mutator will works fine.

as users, what we need to do is to write the correct json syntax as correctly as possible (perhaps according to LL(1) rules), checking and testing more, and then fuzzing in the production environment

@h1994st
Copy link
Collaborator

h1994st commented Jan 18, 2024

nope, the commit ff4e5a2 is perfect, Grammar-Mutator will works fine.

as users, what we need to do is to write the correct json syntax as correctly as possible (perhaps according to LL(1) rules), checking and testing more, and then fuzzing in the production environment

@0x7Fancy do you mean we need to revert you PR that was merged recently?

@0x7Fancy
Copy link
Contributor Author

yes, revert that PR. (or let me overwrite it later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants