-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IGNORECASE ability to Token Remapping. #67
base: master
Are you sure you want to change the base?
Conversation
With reference to: dabeaz#62 I've found a problem with tokens remapping in the eventuality of: reflags = re.IGNORECASE This change allows creating special cases ignoring capitalization in the eventuality of reflags = re.IGNORECASE assignment in Lexer class. Example: class MyLexer(sly.Lexer): reflags = re.IGNORECASE # Base ID rule ID = r'[a-zA-Z_][a-zA-Z0-9_]*' # Special cases ID['if'] = IF ID['else'] = ELSE ID['while'] = WHILE # Now if,If,IF,iF are handle as the same special cases
I'm running into this issue too. Any update on getting this merged in? |
Apologies on the delay. I do NOT see this change making it into SLY because of its special purpose nature and potential impact on performance. However, it is possible to achieve the desired effect using a function:
|
I would just note, that I'm going to keep this open for now. Even though I'm leaning towards not doing this, I might reconsider. I just don't have any timeline for it. In the meantime, use the function as a workaround. |
Fair enough. That workaround works for me. (And actually is better because I can customize the generated token to my liking.) My only argument for accepting the MR, would be that it would seem to be expected behavior for token remapping to be case-insensitive if the user set that re flag to IGNORECASE. Principle of least surprises. |
I agree with @jpsnyder and also using reflags way is sound like flex approch. |
While I'm not sure on the performance hit for something like this, but would it make sense for the string value within the brackets for token remapping actually be a regex pattern like everything else? In which case, the regex pattern would be applied to the matches of the main token being remapped. (Make use of the .fullmatch() function) This would solve the case sensitivity issue and provide more flexibility while still technically supporting the old way. ID["if"] = IF
ID["else"] = ELSE
ID["(?i)while"] = WHILE # case-insenstive mapping
ID["def[0-9]"] = DEF # def followed by a number On the other hand, this is probably feature creep... |
With reference to:
#62
I've found a problem with tokens remapping in the eventuality of:
reflags = re.IGNORECASE
This change allows creating special cases ignoring capitalization in the eventuality of
reflags = re.IGNORECASE is set in Lexer class.
Example:
class MyLexer(sly.Lexer):
reflags = re.IGNORECASE
# Base ID rule
ID = r'[a-zA-Z_][a-zA-Z0-9_]*'