Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macros juxtaposed with strings incorrectly(?) get raw string literal #509

Open
NHDaly opened this issue Sep 27, 2024 · 4 comments
Open

Macros juxtaposed with strings incorrectly(?) get raw string literal #509

NHDaly opened this issue Sep 27, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@NHDaly
Copy link
Member

NHDaly commented Sep 27, 2024

If you directly juxtapose a macro with a string, the macro is supplied a raw literal version of the string as if the macro were a string macro, even though it's not.

Here is an example:

julia> macro m(x)
           return esc(x)
       end
@m (macro with 1 method)

julia> @m "hey $(2+2)"
"hey 4"

julia> @m"hey $(2+2)"
"hey \$(2+2)"

julia> m"hey $(2+2)"
ERROR: LoadError: UndefVarError: `@m_str` not defined
in expression starting at REPL[4]:1

I think this is a bug. On julia 1.6, which I happen to have installed and which doesn't yet have JuliaSyntax, we get this error instead:

julia> VERSION
v"1.6.7"

julia> @m"hey $(2+2)"
ERROR: syntax: invalid macro usage "@(@m_str # REPL[4], line 1 "hey $(2+2)")"
Stacktrace:
 [1] top-level scope
   @ none:1

I believe the mistake is that the macro is being parsed as if it's a string macro, since the input matches what a string macro would see:

julia> macro m_str(x)
           return esc(x)
       end
@m_str (macro with 1 method)

julia> m"hey $(2+2)"
"hey \$(2+2)"

I see the current behavior on both julia 1.10 and 1.12:

julia> versioninfo()
Julia Version 1.12.0-DEV.1173
Commit 169e9e8de1* (2024-09-09 15:10 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin23.5.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_SSL_CA_ROOTS_PATH = 
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_SSL_CA_ROOTS_PATH = 
@c42f c42f added the bug Something isn't working label Oct 7, 2024
@c42f
Copy link
Member

c42f commented Oct 7, 2024

This is due to a lexing difficulty: we need to know which strings are raw while lexing, and we use lexer state to guess (ie, a heuristic based on the previous token). In this case, it fails so we need to fix that.

julia> collect(JuliaSyntax.Tokenize.tokenize("@m\"str\$x\""))
6-element Vector{JuliaSyntax.Tokenize.RawToken}:
 0-0        @              
 1-1        Identifier     
 2-2        "              
 3-7        String         
 8-8        "              
 9-8        EndMarker      

julia> collect(JuliaSyntax.Tokenize.tokenize("@m \"str\$x\""))
9-element Vector{JuliaSyntax.Tokenize.RawToken}:
 0-0        @              
 1-1        Identifier     
 2-2        Whitespace     
 3-3        "              
 4-6        String         
 7-7        $              
 8-8        Identifier     
 9-9        "              
 10-9       EndMarker

I think it'd be good enough to track the two previous tokens and check whether one was an @ - this is probably good enough in practice.

(Unfortunately, Julia also allows syntax like @A.B.C.x"str" to mean A.B.C.@x"str" and making that also work would need feedback from the parser state to the lexer (uuugh!) but I really feel the @ in front of the module name is horrible syntax and should be deprecated!)

@KristofferC
Copy link
Member

and making that also work would need feedback from the parser state to the lexer (uuugh!)

😢

@NHDaly
Copy link
Member Author

NHDaly commented Oct 9, 2024

🤔 I actually think you could leave the Lexer as-is. Given the above tokens, i think we could still raise an Exception later on in parsing/lowering from the juxtaposed macro call and the string?
In other words, it's okay that we "incorrectly" parsed a raw-string, since we're going to throw an error later for the lack of whitespace?

That would be consistent with the 1.6 behavior:

julia> @m"hey $(2+2)"
ERROR: syntax: invalid macro usage "@(@m_str # REPL[4], line 1 "hey $(2+2)")"

I think you could parse

 0-0        @              
 1-1        Identifier     
 2-2        "              
 3-7        String         
 8-8        "              
 9-8        EndMarker      

into either of these expressions, which could both error?:

@(Identifier"String")   # This seems to be what they parsed in 1.6, where Identifier"String" lowers into `@Identifier_str"String"`
(@Identifier"String")   # We could just disallow juxtaposing a macrocall with a string?

It seems like both of those approaches would be robust to qualified names?

@NHDaly
Copy link
Member Author

NHDaly commented Oct 9, 2024

But that said:

but I really feel the @ in front of the module name is horrible syntax and should be deprecated!)

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants