Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IGNORECASE ability to Token Remapping. #67

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

manux81
Copy link

@manux81 manux81 commented Apr 10, 2021

With reference to:
#62
I've found a problem with tokens remapping in the eventuality of:
reflags = re.IGNORECASE
This change allows creating special cases ignoring capitalization in the eventuality of
reflags = re.IGNORECASE is set in Lexer class.

Example:
class MyLexer(sly.Lexer):
reflags = re.IGNORECASE
# Base ID rule
ID = r'[a-zA-Z_][a-zA-Z0-9_]*'

# Special cases
ID['if'] = IF
ID['else'] = ELSE
ID['while'] = WHILE
# Now if,If,IF,iF are handle as the same special case

With reference to:
dabeaz#62
I've found a problem with tokens remapping in the eventuality of:
reflags = re.IGNORECASE
This change allows creating special cases ignoring capitalization in the eventuality of
reflags = re.IGNORECASE assignment in Lexer class.

Example:
class MyLexer(sly.Lexer):
    reflags = re.IGNORECASE
    # Base ID rule
    ID = r'[a-zA-Z_][a-zA-Z0-9_]*'

    # Special cases
    ID['if'] = IF
    ID['else'] = ELSE
    ID['while'] = WHILE
    # Now if,If,IF,iF are handle as the same special cases
@jpsnyder
Copy link

I'm running into this issue too. Any update on getting this merged in?

@manux81
Copy link
Author

manux81 commented Sep 26, 2021

Hi @jpsnyder, no @dabeaz still has not been reply to me.

@dabeaz
Copy link
Owner

dabeaz commented Sep 26, 2021

Apologies on the delay. I do NOT see this change making it into SLY because of its special purpose nature and potential impact on performance. However, it is possible to achieve the desired effect using a function:

keywords = { 'if', 'else', 'while', }

@(_r'[a-zA-Z_][a-zA-Z0-9_]*')
def ID(self, t):
    if t.value.lower() in keywords:
        t.type = t.value.upper()
    return t

@dabeaz
Copy link
Owner

dabeaz commented Sep 26, 2021

I would just note, that I'm going to keep this open for now. Even though I'm leaning towards not doing this, I might reconsider. I just don't have any timeline for it. In the meantime, use the function as a workaround.

@jpsnyder
Copy link

jpsnyder commented Sep 27, 2021

Fair enough. That workaround works for me. (And actually is better because I can customize the generated token to my liking.)

My only argument for accepting the MR, would be that it would seem to be expected behavior for token remapping to be case-insensitive if the user set that re flag to IGNORECASE. Principle of least surprises.

@manux81
Copy link
Author

manux81 commented Sep 29, 2021

I agree with @jpsnyder and also using reflags way is sound like flex approch.

@jpsnyder
Copy link

jpsnyder commented Sep 30, 2021

While I'm not sure on the performance hit for something like this, but would it make sense for the string value within the brackets for token remapping actually be a regex pattern like everything else? In which case, the regex pattern would be applied to the matches of the main token being remapped. (Make use of the .fullmatch() function)

This would solve the case sensitivity issue and provide more flexibility while still technically supporting the old way.
It could also allow us to apply different re flags to the token mapping than what is set globally.

ID["if"] = IF
ID["else"] = ELSE
ID["(?i)while"] = WHILE   # case-insenstive mapping
ID["def[0-9]"] = DEF   # def followed by a number

On the other hand, this is probably feature creep...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants