Description
I've been analyzing antlr/grammars-v4#4465. Apparently, the Antlr tool is inserting an epsilon transition with non-existent action arbitrarily for the first alt of a rule that has direct left recursion. This action causes AdaptivePredict()
to make an incorrect decision when parsing certain input because of "actions shadowing predicates". The workaround swaps out rules that have derivations containing predicates out of the arbitrary first alt. However, this seems quite wrong. It feels like a bug in the tool.
Here is an example: the classic "expression" grammar.
// Template generated code from trgen 0.23.18
// $antlr-format alignTrailingComments true, columnLimit 150, minEmptyLines 1, maxEmptyLinesToKeep 1, reflowComments false, useTab false
// $antlr-format allowShortRulesOnASingleLine false, allowShortBlocksOnASingleLine true, alignSemicolons hanging, alignColons hanging
grammar Arithmetic;
file_
: expression (SEMI expression)* EOF
;
expression
: expression POW expression
| expression (TIMES | DIV) expression
| expression (PLUS | MINUS) expression
| LPAREN expression RPAREN
| (PLUS | MINUS)* atom
;
atom
: scientific
| variable
;
scientific
: SCIENTIFIC_NUMBER
;
variable
: VARIABLE
;
VARIABLE
: VALID_ID_START VALID_ID_CHAR*
;
SCIENTIFIC_NUMBER
: NUMBER (E SIGN? UNSIGNED_INTEGER)?
;
LPAREN
: '('
;
RPAREN
: ')'
;
PLUS
: '+'
;
MINUS
: '-'
;
TIMES
: '*'
;
DIV
: '/'
;
GT
: '>'
;
LT
: '<'
;
EQ
: '='
;
POINT
: '.'
;
POW
: '^'
;
SEMI
: ';'
;
WS
: [ \r\n\t]+ -> channel(HIDDEN)
;
fragment VALID_ID_START
: ('a' .. 'z')
| ('A' .. 'Z')
| '_'
;
fragment VALID_ID_CHAR
: VALID_ID_START
| ('0' .. '9')
;
fragment NUMBER
: ('0' .. '9')+ ('.' ('0' .. '9')+)?
;
fragment UNSIGNED_INTEGER
: ('0' .. '9')+
;
fragment E
: 'E'
| 'e'
;
fragment SIGN
: ('+' | '-')
;
This is what the tool produces for rule expression
:
Note the empty transition s20 -> s21
. There are no explicit actions in the grammar, and it always adds the fake action epsilon transition to alt one of a decision.
I think the idea of "actions shadow predicates" is hard to understand. According to the doc: More importantly, the parser can't execute actions until it has decided which alternative to match. That's because actions have side effects and we can't undo things like print statements. For example, in the following rule, the parser can't execute the action in front of the {java5}? predicate before committing to that alternative.
I guess the reason for it is because AdaptivePredict()
is simultaneously testing out various derivations. One cannot rely on a state variable.