Skip to content

Bug: Parsing extract operator with newlines #166

@felix-andreas

Description

@felix-andreas

Hi, I'm not an expert with tree-sitter, but I found a behavior that doesn't seem correct to me:

Extract operators $/@ don't include the RHS when there are multiple newlines between operator and the RHS.

This case works as expected:

foo$
bar
program [0, 0] - [2, 0]
  extract_operator [0, 0] - [1, 3]
    lhs: identifier [0, 0] - [0, 3]
    rhs: identifier [1, 0] - [1, 3]

Once you add an additional newline, the RHS is parsed as separate identifier:

foo$

bar

Not only does the parser not recognise bar as RHS, it also has the wrong end_position for the extract_operator which is [1, 0] instead of [0, 4] because it includes a single newline;

program [0, 0] - [3, 0]
  extract_operator [0, 0] - [1, 0]
    lhs: identifier [0, 0] - [0, 3]
  identifier [2, 0] - [2, 3]

Attempted fix

I tried to adjust the grammar by moving repeat($._newline) inside the optional RHS:

// NOTE: Expression on LHS, string/identifier/dots/dot_dot_i on RHS
extract_operator: $ => {
  const table = [
    ["$", PREC.EXTRACT],
    ["@", PREC.EXTRACT]
  ];

  return choice(...table.map(([operator, prec]) => prec.ASSOC(prec.RANK, seq(
    field("lhs", $._expression),
    field("operator", operator),
    // repeat($._newline), // <- move from here
    optional(seq( 
      repeat($._newline), // <- to here
      field("rhs", $._string_or_identifier_or_dots_or_dot_dot_i)
    ))
  ))))
}

This fixes the empty line case:

foo$

bar
program [0, 0] - [3, 0]
  extract_operator [0, 0] - [2, 3]
    lhs: identifier [0, 0] - [0, 3]
    rhs: identifier [2, 0] - [2, 3]

But breaks the case without an RHS:

foo$
program [0, 0] - [1, 0]
  extract_operator [0, 0] - [0, 4]
    lhs: identifier [0, 0] - [0, 3]
  ERROR [0, 4] - [1, 0]

Workaround

I currently use this workaround:

// HACK: tree-sitter-r has wrong ending_position for extract with newlines before the rhs:
// it only includes the newline but not the rhs. this hack uses at least the correct end_position
let end_position = |node: Node| {
    if node.kind() != "extract_operator" {
        return node.end_position();
    }

    node.child_by_field_name("rhs")
        .map(|rhs| rhs.end_position())
        .or_else(|| {
            node.child_by_field_name("operator")
                .map(|operator| operator.end_position())
        })
        // note: this case is unexpected
        .unwrap_or_else(|| node.end_position())
};

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions