Rule idea: warn about unsafe usage of `$` in regular expressions #15017

bustbr · 2024-12-16T10:09:09Z

In Python's implementation of re the $ does not only match the end of the line, but also before any line break, even without multiline mode.
There's an article by the OpenSSF about this issue suggesting to use \Z instead, or prefer the fullmatch function.

An example of the issue:

>>> re.search(r'cat$', 'cat\n')
<re.Match object; span=(0, 3), match='cat'>

Using \Z fixes this:

>>> re.search(r'cat\Z', 'cat\n')  # no match
>>> re.search(r'cat\Z', 'cat')
<re.Match object; span=(0, 3), match='cat'>

Or using fullmatch:

>>> re.fullmatch('cat', 'cat\n')  # no match
>>> re.fullmatch('cat', 'cat')
<re.Match object; span=(0, 3), match='cat'>

Should Ruff warn about using $ with search?

The text was updated successfully, but these errors were encountered:

InSyncWithFoo · 2024-12-17T09:35:16Z

In Python's implementation of re the $ does not only match the end of the line, but also before any line break, even without multiline mode.

This is incorrect. In non-multiline mode, $ matches either the end of the entire input or the position right before the trailing newline (only \n, not \r\n). It is similar to PCRE's \Z, whereas Python's \Z resembles PCRE's \z:

ANCHORS AND SIMPLE ASSERTIONS

  \b          word boundary
  \B          not a word boundary
  ^           start of subject
                also after an internal newline in multiline mode
                (after any newline if PCRE2_ALT_CIRCUMFLEX is set)
  \A          start of subject
  $           end of subject
                also before newline at end of subject
                also before internal newline in multiline mode
  \Z          end of subject
                also before newline at end of subject
  \z          end of subject
  \G          first matching position in subject

>>> import re
>>> [*re.finditer(r'$', 'foo\nbar\r\n')]
[<re.Match object; span=(8, 8), match=''>, <re.Match object; span=(9, 9), match=''>]
>>> [*re.finditer(r'\Z', 'foo\nbar\r\n')]
[<re.Match object; span=(9, 9), match=''>]

bustbr · 2024-12-17T10:13:08Z

In Python's implementation of re the $ does not only match the end of the line, but also before any line break, even without multiline mode.

This is incorrect. In non-multiline mode, $ matches either the end of the entire input or the position right before the trailing newline (only \n, not \r\n). It is similar to PCRE's \Z, whereas Python's \Z resembles PCRE's \z:

Thank you for the clarification.
Based on your example here's a series of calls to highlight the specific behavior of $:

>>> re.search(r'$', 'foo\nbar')
<re.Match object; span=(7, 7), match=''>
>>> re.search(r'$', 'foo\nbar\n')
<re.Match object; span=(7, 7), match=''>    # still matches at pos 7, before the final new line
>>> re.search(r'$', 'foo\nbar\n\n')
<re.Match object; span=(8, 8), match=''>    # matches at pos 8, before the final new line

>>> re.search(r'\Z', 'foo\nbar')
<re.Match object; span=(7, 7), match=''>
>>> re.search(r'\Z', 'foo\nbar\n')
<re.Match object; span=(8, 8), match=''>
>>> re.search(r'\Z', 'foo\nbar\n\n')
<re.Match object; span=(9, 9), match=''>

AlexWaygood added rule Implementing or modifying a lint rule needs-decision Awaiting a decision from a maintainer labels Dec 16, 2024

tdulcet mentioned this issue Dec 16, 2024

Extend RUF055 with more patterns #14738

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rule idea: warn about unsafe usage of `$` in regular expressions #15017

Rule idea: warn about unsafe usage of `$` in regular expressions #15017

bustbr commented Dec 16, 2024

InSyncWithFoo commented Dec 17, 2024 •

edited

Loading

bustbr commented Dec 17, 2024

Rule idea: warn about unsafe usage of $ in regular expressions #15017

Rule idea: warn about unsafe usage of $ in regular expressions #15017

Comments

bustbr commented Dec 16, 2024

InSyncWithFoo commented Dec 17, 2024 • edited Loading

bustbr commented Dec 17, 2024

Rule idea: warn about unsafe usage of `$` in regular expressions #15017

Rule idea: warn about unsafe usage of `$` in regular expressions #15017

InSyncWithFoo commented Dec 17, 2024 •

edited

Loading