Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CQL2/Grammar: Quoted identifiers not defined as expected #916

Open
jerstlouis opened this issue Apr 1, 2024 · 6 comments
Open

CQL2/Grammar: Quoted identifiers not defined as expected #916

jerstlouis opened this issue Apr 1, 2024 · 6 comments
Labels

Comments

@jerstlouis
Copy link
Member

jerstlouis commented Apr 1, 2024

My understanding is we should be able to put just about anything in quoted identifiers (" ").

Currently, the propertyName rule which is the closest thing that corresponds to an identifier token uses the exact same identifier inside and outside the quotes:

propertyName = identifier | "\"" identifier "\"";

I seem to recall us discussing this before, but don't recall the decision.
In the currently state, this completely defeats the purpose of the quoted identifiers however.

A use case in our CartoSym-CSS styles is a selector on the "sentinel2-l2a" layer which contains a hyphen.

See how we defined this in the CartoSym-CSS BNF for ANTLR:

https://github.com/opengeospatial/styles-and-symbology/blob/main/1-core/schemas/CartoSym-CSS-Lexer.g4#L110

allowing inside the double quotes all the same characters as inside a string literal with the exception of the double quote.

The only thing that this might currently be missing the the capability to backslash escape the double quote \" or to otherwise allow to use single quotes directly).

@cportele cportele added the CQL2 label Apr 2, 2024
@cportele
Copy link
Member

cportele commented Apr 8, 2024

Meeting 2024-04-08: The original issue was #332 which included the following statement:

GeoServer CQL implementation supports double quoted identifiers for this very reason, avoid clashes and deal with special characters that might have been used a part of an identifier (relationa databases seems to allow pretty much anything in column names, as long as the name is double quoted).

The PR that closed this issue (in 2020) resolved the "avoid clashes" aspect, but not the "deal with special characters in an identifier" part. The text in #332 is not entirely clear if this was intended or not. Since this change is also in the BNF for years it is unclear, if we could change this now as the document is now under TC control.

@pvretano, any thoughts?

@jerstlouis
Copy link
Member Author

jerstlouis commented Apr 8, 2024

@cportele Thinking about this more, the current rule does not even serve the purpose of avoiding clashes with reserved CQL2 keywords. So there definitely is a problem to fix anyways.

Either:

  • the Lexer would have lexed them as identifier tokens, in which case the double-quotes are useless, or
  • the lexer has not lexed them as identifier tokens, in which case the double-quotes do not work as intended.

Although perhaps in ANTLR4 the ambiguities somehow get resolved with the order of the rules as you have them in:

https://github.com/interactive-instruments/xtraplatform-spatial/blob/master/xtraplatform-cql/src/main/antlr/de/ii/xtraplatform/cql/infra/CqlLexer.g4#L388

At minimum I think that IdentifierBare there would need to be qualified as a fragment (something that is not actually returned by the lexer as a token).

@cportele
Copy link
Member

cportele commented Apr 8, 2024

@jerstlouis - I don't think there is a problem in our grammar (or the BNF) wrt to avoiding clashes. We test all the different cases in unit tests and the results are as expected.

If the parser reads S_INTERSECTS(bbox,BBOX(0,0,180,90)) it fails since the parser expects a ( after bbox. If the parser reads S_INTERSECTS("bbox",BBOX(0,0,180,90)) it will parse "bbox" as a propertyName and the property identifier is bbox.

@jerstlouis
Copy link
Member Author

jerstlouis commented Apr 8, 2024

@cportele right, sorry, I just tried that with your lexer and parser on http://lab.antlr.org/ . Thanks for double-checking.

I'm still confused as to why there's no error or ambiguity given that IdentifierBare is not marked as a fragment (things still work the same marking it as such).

In the BNF grammar as a single piece though (not separated as lexer + parser), I think it might still be ambiguous.
I think it is important that the " Identifier " rule (propertyName) be on the lexer side, otherwise the issue I was mentioning above would apply, and that is not expressly communicated by the single-piece BNF grammar. Although I guess it depends on how the BNF is interpreted.

(I just tested that and indeed the double-quoting of bbox no longer works if I move the Identifier rule from the Lexer to an identifier rule in the parser).

@cportele
Copy link
Member

Meeting 2024-04-22: The current rule is what it is since December 2020 and it did not raise any issues in implementations. Given that the document is under TC control now (in the approval vote) and this is not a bug, we cannot make the change now. It would either require a TC vote with a comment or it is future work.

@cportele
Copy link
Member

Since no issue was raised in the approval process, I move this issue to "future work" (see the previous comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants