[tool] Antlr tool enters an epsilon action transition with direct left recursion removal.

I've been analyzing https://github.com/antlr/grammars-v4/issues/4465. Apparently, the Antlr tool is inserting an epsilon transition with non-existent action *arbitrarily for the first alt* of a rule that has direct left recursion. This action causes `AdaptivePredict()` to make an incorrect decision when parsing certain input because of ["actions shadowing predicates"](https://github.com/antlr/antlr4/blob/master/doc/predicates.md#finding-visible-predicates). The workaround swaps out rules that have derivations containing predicates out of the arbitrary first alt. However, this seems quite wrong. It *feels* like a bug in the tool.

Here is an example: the classic "expression" grammar.
```
// Template generated code from trgen 0.23.18

// $antlr-format alignTrailingComments true, columnLimit 150, minEmptyLines 1, maxEmptyLinesToKeep 1, reflowComments false, useTab false
// $antlr-format allowShortRulesOnASingleLine false, allowShortBlocksOnASingleLine true, alignSemicolons hanging, alignColons hanging

grammar Arithmetic;

file_
    : expression (SEMI expression)* EOF
    ;

expression
    : expression POW expression
    | expression (TIMES | DIV) expression
    | expression (PLUS | MINUS) expression
    | LPAREN expression RPAREN
    | (PLUS | MINUS)* atom
    ;

atom
    : scientific
    | variable
    ;

scientific
    : SCIENTIFIC_NUMBER
    ;

variable
    : VARIABLE
    ;

VARIABLE
    : VALID_ID_START VALID_ID_CHAR*
    ;

SCIENTIFIC_NUMBER
    : NUMBER (E SIGN? UNSIGNED_INTEGER)?
    ;

LPAREN
    : '('
    ;

RPAREN
    : ')'
    ;

PLUS
    : '+'
    ;

MINUS
    : '-'
    ;

TIMES
    : '*'
    ;

DIV
    : '/'
    ;

GT
    : '>'
    ;

LT
    : '<'
    ;

EQ
    : '='
    ;

POINT
    : '.'
    ;

POW
    : '^'
    ;

SEMI
    : ';'
    ;

WS
    : [ \r\n\t]+ -> channel(HIDDEN)
    ;

fragment VALID_ID_START
    : ('a' .. 'z')
    | ('A' .. 'Z')
    | '_'
    ;

fragment VALID_ID_CHAR
    : VALID_ID_START
    | ('0' .. '9')
    ;

fragment NUMBER
    : ('0' .. '9')+ ('.' ('0' .. '9')+)?
    ;

fragment UNSIGNED_INTEGER
    : ('0' .. '9')+
    ;

fragment E
    : 'E'
    | 'e'
    ;

fragment SIGN
    : ('+' | '-')
    ;
```

This is what the tool produces for rule `expression`:

![Image](https://github.com/user-attachments/assets/7cf11059-e2bb-492f-b6c0-3bbba910a89c)

Note the empty transition `s20 -> s21`. There are no explicit actions in the grammar, and it always adds the fake action epsilon transition to alt one of a decision.

I think the idea of "actions shadow predicates" is hard to understand. According to the doc: `More importantly, the parser can't execute actions until it has decided which alternative to match. That's because actions have side effects and we can't undo things like print statements. For example, in the following rule, the parser can't execute the action in front of the {java5}? predicate before committing to that alternative.` I guess the reason for it is because `AdaptivePredict()` is simultaneously testing out various derivations. One cannot rely on a state variable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tool] Antlr tool enters an epsilon action transition with direct left recursion removal. #4819

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[tool] Antlr tool enters an epsilon action transition with direct left recursion removal. #4819

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions