Skip to content

Semicolon handling at top level is not quite right #155

@DavisVaughan

Description

@DavisVaughan

I think that at top level semicolons have to be preceded by an expression?

Interestingly, inside a { any number of semicolons are blindly consumed

It's possible that semicolons are only valid:

  • At top level, when preceded by an expression
  • Directly inside a { scope (but not nested inside one)

I also wonder if we should emit semicolons in the parse tree as real tokens. Otherwise it just looks like whitespace to downstream consumers.

# Top level `;` is a parse error when its all alone, but we happily consume it
parse(text = ";")
#> Error in parse(text = ";"): <text>:1:1: unexpected ';'
#> 1: ;
#>     ^
treesitter::text_parse(";", treesitter.r::language())
#> <tree_sitter_node>
#> 
#> ── Text ────────────────────────────────────────────────────────────────────────
#> ;
#> 
#> ── S-Expression ────────────────────────────────────────────────────────────────
#> (program [(0, 0), (0, 1)])

# We eat them all, when we should error
treesitter::text_parse(";;;", treesitter.r::language())
#> <tree_sitter_node>
#> 
#> ── Text ────────────────────────────────────────────────────────────────────────
#> ;;;
#> 
#> ── S-Expression ────────────────────────────────────────────────────────────────
#> (program [(0, 0), (0, 3)])

# This is fine at top level
parse(text = "1;")
#> expression(1)
treesitter::text_parse("1;", treesitter.r::language())
#> <tree_sitter_node>
#> 
#> ── Text ────────────────────────────────────────────────────────────────────────
#> 1;
#> 
#> ── S-Expression ────────────────────────────────────────────────────────────────
#> (program [(0, 0), (0, 2)]
#>   (float [(0, 0), (0, 1)])
#> )

# Interestingly this works (and we parse that fine)
parse(text = "{ ; }")
#> expression({ ; })
parse(text = "{ ; ; }")
#> expression({ ; ; })
treesitter::text_parse("{ ; }", treesitter.r::language())
#> <tree_sitter_node>
#> 
#> ── Text ────────────────────────────────────────────────────────────────────────
#> { ; }
#> 
#> ── S-Expression ────────────────────────────────────────────────────────────────
#> (program [(0, 0), (0, 5)]
#>   (braced_expression [(0, 0), (0, 5)]
#>     open: "{" [(0, 0), (0, 1)]
#>     close: "}" [(0, 4), (0, 5)]
#>   )
#> )
treesitter::text_parse("{ ; ; }", treesitter.r::language())
#> <tree_sitter_node>
#> 
#> ── Text ────────────────────────────────────────────────────────────────────────
#> { ; ; }
#> 
#> ── S-Expression ────────────────────────────────────────────────────────────────
#> (program [(0, 0), (0, 7)]
#>   (braced_expression [(0, 0), (0, 7)]
#>     open: "{" [(0, 0), (0, 1)]
#>     close: "}" [(0, 6), (0, 7)]
#>   )
#> )

# Note that this doesn't work
parse(text = "x[;]")
#> Error in parse(text = "x[;]"): <text>:1:3: unexpected ';'
#> 1: x[;
#>       ^

# Nor does this, so its not like with newlines where newlines are consumed
# recursively within a `(` / `[` / `[[` scope
parse(text = "{ x[;] }")
#> Error in parse(text = "{ x[;] }"): <text>:1:5: unexpected ';'
#> 1: { x[;
#>         ^

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions