Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule idea: warn about unsafe usage of $ in regular expressions #15017

Open
bustbr opened this issue Dec 16, 2024 · 2 comments
Open

Rule idea: warn about unsafe usage of $ in regular expressions #15017

bustbr opened this issue Dec 16, 2024 · 2 comments
Labels
needs-decision Awaiting a decision from a maintainer rule Implementing or modifying a lint rule

Comments

@bustbr
Copy link

bustbr commented Dec 16, 2024

In Python's implementation of re the $ does not only match the end of the line, but also before any line break, even without multiline mode.
There's an article by the OpenSSF about this issue suggesting to use \Z instead, or prefer the fullmatch function.

An example of the issue:

>>> re.search(r'cat$', 'cat\n')
<re.Match object; span=(0, 3), match='cat'>

Using \Z fixes this:

>>> re.search(r'cat\Z', 'cat\n')  # no match
>>> re.search(r'cat\Z', 'cat')
<re.Match object; span=(0, 3), match='cat'>

Or using fullmatch:

>>> re.fullmatch('cat', 'cat\n')  # no match
>>> re.fullmatch('cat', 'cat')
<re.Match object; span=(0, 3), match='cat'>

Should Ruff warn about using $ with search?

@AlexWaygood AlexWaygood added rule Implementing or modifying a lint rule needs-decision Awaiting a decision from a maintainer labels Dec 16, 2024
@InSyncWithFoo
Copy link
Contributor

InSyncWithFoo commented Dec 17, 2024

In Python's implementation of re the $ does not only match the end of the line, but also before any line break, even without multiline mode.

This is incorrect. In non-multiline mode, $ matches either the end of the entire input or the position right before the trailing newline (only \n, not \r\n). It is similar to PCRE's \Z, whereas Python's \Z resembles PCRE's \z:

ANCHORS AND SIMPLE ASSERTIONS

  \b          word boundary
  \B          not a word boundary
  ^           start of subject
                also after an internal newline in multiline mode
                (after any newline if PCRE2_ALT_CIRCUMFLEX is set)
  \A          start of subject
  $           end of subject
                also before newline at end of subject
                also before internal newline in multiline mode
  \Z          end of subject
                also before newline at end of subject
  \z          end of subject
  \G          first matching position in subject
>>> import re
>>> [*re.finditer(r'$', 'foo\nbar\r\n')]
[<re.Match object; span=(8, 8), match=''>, <re.Match object; span=(9, 9), match=''>]
>>> [*re.finditer(r'\Z', 'foo\nbar\r\n')]
[<re.Match object; span=(9, 9), match=''>]

@bustbr
Copy link
Author

bustbr commented Dec 17, 2024

In Python's implementation of re the $ does not only match the end of the line, but also before any line break, even without multiline mode.

This is incorrect. In non-multiline mode, $ matches either the end of the entire input or the position right before the trailing newline (only \n, not \r\n). It is similar to PCRE's \Z, whereas Python's \Z resembles PCRE's \z:

Thank you for the clarification.
Based on your example here's a series of calls to highlight the specific behavior of $:

>>> re.search(r'$', 'foo\nbar')
<re.Match object; span=(7, 7), match=''>
>>> re.search(r'$', 'foo\nbar\n')
<re.Match object; span=(7, 7), match=''>    # still matches at pos 7, before the final new line
>>> re.search(r'$', 'foo\nbar\n\n')
<re.Match object; span=(8, 8), match=''>    # matches at pos 8, before the final new line

>>> re.search(r'\Z', 'foo\nbar')
<re.Match object; span=(7, 7), match=''>
>>> re.search(r'\Z', 'foo\nbar\n')
<re.Match object; span=(8, 8), match=''>
>>> re.search(r'\Z', 'foo\nbar\n\n')
<re.Match object; span=(9, 9), match=''>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-decision Awaiting a decision from a maintainer rule Implementing or modifying a lint rule
Projects
None yet
Development

No branches or pull requests

3 participants