[feature] Support multiline regex in text filtering #2857

MoshiMoshi0 · 2024-12-23T19:02:49Z

It seems that Trigger/wait for text, Ignore lines containing, Block change-detection while text matches in Text filtering section do not support multiline regex.

Narrowed it down to:

changedetection.io/changedetectionio/html_tools.py

Lines 365 to 403 in 5dea5e1

    
           def strip_ignore_text(content, wordlist, mode="content"): 
        
               i = 0 
        
               output = [] 
        
               ignore_text = [] 
        
               ignore_regex = [] 
        
               ignored_line_numbers = [] 
        
               for k in wordlist: 
        
                   # Is it a regex? 
        
                   res = re.search(PERL_STYLE_REGEX, k, re.IGNORECASE) 
        
                   if res: 
        
                       ignore_regex.append(re.compile(perl_style_slash_enclosed_regex_to_options(k))) 
        
                   else: 
        
                       ignore_text.append(k.strip()) 
        
               for line in content.splitlines(keepends=True): 
        
                   i += 1 
        
                   # Always ignore blank lines in this mode. (when this function gets called) 
        
                   got_match = False 
        
                   for l in ignore_text: 
        
                       if l.lower() in line.lower(): 
        
                           got_match = True 
        
                   if not got_match: 
        
                       for r in ignore_regex: 
        
                           if r.search(line): 
        
                               got_match = True 
        
                   if not got_match: 
        
                       # Not ignored, and should preserve "keepends" 
        
                       output.append(line) 
        
                   else: 
        
                       ignored_line_numbers.append(i) 
        
               # Used for finding out what to highlight 
        
               if mode == "line numbers": 
        
                   return ignored_line_numbers 
        
               return ''.join(output)

The function iterates over the content line by line and matches each regex to each line:

for line in content.splitlines(keepends=True):

The function could be reworked to use re.finditer/re.findall on the whole content instead.

The text was updated successfully, but these errors were encountered:

dgtlmoon · 2024-12-23T21:01:31Z

it COULD be reworked, but then it would maybe break all existing filters, whats your thoughts on how to handle that?

MoshiMoshi0 · 2024-12-23T21:24:54Z

Unless I'm missing something it would only break regex filters that have s or m flags set (currently those flags have no effect) or regex that captures \n in the middle of the pattern (currently such regex matches nothing). Everything else should behave the same.

Other option is to match on the whole content only when the s or m flag is set, otherwise use the current implementation.

dgtlmoon · 2024-12-24T10:39:25Z

Other option is to match on the whole content only when the s or m flag is set, otherwise use the current implementation.

yes! what i'm thinking.. any downsides?

MoshiMoshi0 added the enhancement New feature or request label Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Support multiline regex in text filtering #2857

[feature] Support multiline regex in text filtering #2857

MoshiMoshi0 commented Dec 23, 2024

dgtlmoon commented Dec 23, 2024

MoshiMoshi0 commented Dec 23, 2024

dgtlmoon commented Dec 24, 2024 •

edited

Loading

[feature] Support multiline regex in text filtering #2857

[feature] Support multiline regex in text filtering #2857

Comments

MoshiMoshi0 commented Dec 23, 2024

dgtlmoon commented Dec 23, 2024

MoshiMoshi0 commented Dec 23, 2024

dgtlmoon commented Dec 24, 2024 • edited Loading

dgtlmoon commented Dec 24, 2024 •

edited

Loading