Fix error handling for first empty filter response #3516

ReggX · 2025-10-17T21:44:26Z

Recent refactor (Commit 3b9d19d, issue #3475) introduced a regression in helper function apply_include_filters() due to indentation level being higher than previous code. This regression led to filters throwing an exception if first filter in list has no match even if later filters would have matched.

This PR aims to fix this regression by bringing the indentation level back in line with the for loop instead of inside the loop.

Original code with lower indentation level:

changedetection.io/changedetectionio/processors/text_json_diff/processor.py

Lines 171 to 190 in f7dfc9b

    
           for filter_rule in include_filters_rule: 
        
               # For HTML/XML we offer xpath as an option, just start a regular xPath "/.." 
        
               if filter_rule[0] == '/' or filter_rule.startswith('xpath:'): 
        
                   html_content += html_tools.xpath_filter(xpath_filter=filter_rule.replace('xpath:', ''), 
        
                                                           html_content=self.fetcher.content, 
        
                                                           append_pretty_line_formatting=not watch.is_source_type_url, 
        
                                                           is_rss=stream_content_type.is_rss) 
        
               elif filter_rule.startswith('xpath1:'): 
        
                   html_content += html_tools.xpath1_filter(xpath_filter=filter_rule.replace('xpath1:', ''), 
        
                                                            html_content=self.fetcher.content, 
        
                                                            append_pretty_line_formatting=not watch.is_source_type_url, 
        
                                                            is_rss=stream_content_type.is_rss) 
        
               else: 
        
                   html_content += html_tools.include_filters(include_filters=filter_rule, 
        
                                                              html_content=self.fetcher.content, 
        
                                                              append_pretty_line_formatting=not watch.is_source_type_url) 
        
           if not html_content.strip(): 
        
               raise FilterNotFoundInResponse(msg=include_filters_rule, screenshot=self.fetcher.screenshot, xpath_data=self.fetcher.xpath_data)

Refactored code with higher indentation level:

changedetection.io/changedetectionio/processors/text_json_diff/processor.py

Lines 277 to 317 in 3b9d19d

    
           for filter_rule in self.filter_config.include_filters: 
        
               # XPath filters 
        
               if filter_rule[0] == '/' or filter_rule.startswith('xpath:'): 
        
                   filtered_content += html_tools.xpath_filter( 
        
                       xpath_filter=filter_rule.replace('xpath:', ''), 
        
                       html_content=content, 
        
                       append_pretty_line_formatting=not self.watch.is_source_type_url, 
        
                       is_rss=stream_content_type.is_rss 
        
                   ) 
        
               # XPath1 filters (first match only) 
        
               elif filter_rule.startswith('xpath1:'): 
        
                   filtered_content += html_tools.xpath1_filter( 
        
                       xpath_filter=filter_rule.replace('xpath1:', ''), 
        
                       html_content=content, 
        
                       append_pretty_line_formatting=not self.watch.is_source_type_url, 
        
                       is_rss=stream_content_type.is_rss 
        
                   ) 
        
               # JSON filters 
        
               elif any(filter_rule.startswith(prefix) for prefix in json_filter_prefixes): 
        
                   filtered_content += html_tools.extract_json_as_string( 
        
                       content=content, 
        
                       json_filter=filter_rule 
        
                   ) 
        
               # CSS selectors, default fallback 
        
               else: 
        
                   filtered_content += html_tools.include_filters( 
        
                       include_filters=filter_rule, 
        
                       html_content=content, 
        
                       append_pretty_line_formatting=not self.watch.is_source_type_url 
        
                   ) 
        
               # Raise error if filter returned nothing 
        
               if not filtered_content.strip(): 
        
                   raise FilterNotFoundInResponse( 
        
                       msg=self.filter_config.include_filters, 
        
                       screenshot=self.fetcher.screenshot, 
        
                       xpath_data=self.fetcher.xpath_data 
        
                   )

Recent refactor (Commit 3b9d19d) introduced a regression in helper function apply_include_filters() due to indentation level being higher than previous code. This regression led to filters throwing an exception if first filter in list has no match even if later filters would have matched.

dgtlmoon · 2025-10-18T05:54:00Z

ah thank yoU! any chance of a test?

ReggX · 2025-10-18T12:05:10Z

Not quite familiar with your testing methodology.

I assume it would be similar to test_filter_exist_changes, but with a non-hitting filter before .ticket-available

changedetection.io/changedetectionio/tests/test_filter_exist_changes.py

Line 96 in cb02ccc

"include_filters": '.ticket-available',

Alternatively, if you want a unit test instead, I'm not quite sure how to setup ContentProcessor for the test (arguments aren't typed and I have found no example in existing tests)

dgtlmoon · 2025-10-18T20:49:24Z

yeah just read the current tests, i absolutely cant merge this in without a test or a mod to an existing test that tests for it

you could even update one of the existing simple "include filter" rules and put a filter that does not exist at the top

.somemissingfilter\n.the-tested-earlier-filter

dgtlmoon · 2025-10-18T20:49:41Z

only bugs now that we find are 100% where there was not a test

Updated test to include a non-matching selector.

ReggX · 2025-10-18T21:22:51Z

Modified an existing test, but can't test locally on my Windows machine since pytest throws AttributeError: Can't pickle local object 'LiveServer.start.<locals>.worker'

But since it is an existing test, the CI workflow should cover it anyways.

changedetectionio/tests/test_filter_exist_changes.py

dgtlmoon · 2025-10-19T19:28:07Z

lets see how we go, i fixed the '' to "" which should work with \n

dgtlmoon · 2025-10-19T20:28:14Z

Looks good! thanks

Add test for non-matching first filter

c102600

Updated test to include a non-matching selector.

dgtlmoon reviewed Oct 19, 2025

View reviewed changes

changedetectionio/tests/test_filter_exist_changes.py Outdated Show resolved Hide resolved

Update changedetectionio/tests/test_filter_exist_changes.py

d02d326

dgtlmoon merged commit 58b5586 into dgtlmoon:master Oct 19, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix error handling for first empty filter response #3516

Fix error handling for first empty filter response #3516

ReggX commented Oct 17, 2025

Uh oh!

dgtlmoon commented Oct 18, 2025

Uh oh!

ReggX commented Oct 18, 2025

Uh oh!

dgtlmoon commented Oct 18, 2025

Uh oh!

dgtlmoon commented Oct 18, 2025

Uh oh!

ReggX commented Oct 18, 2025

Uh oh!

Uh oh!

dgtlmoon commented Oct 19, 2025

Uh oh!

Uh oh!

dgtlmoon commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	for filter_rule in include_filters_rule:
	# For HTML/XML we offer xpath as an option, just start a regular xPath "/.."
	if filter_rule[0] == '/' or filter_rule.startswith('xpath:'):
	html_content += html_tools.xpath_filter(xpath_filter=filter_rule.replace('xpath:', ''),
	html_content=self.fetcher.content,
	append_pretty_line_formatting=not watch.is_source_type_url,
	is_rss=stream_content_type.is_rss)

	elif filter_rule.startswith('xpath1:'):
	html_content += html_tools.xpath1_filter(xpath_filter=filter_rule.replace('xpath1:', ''),
	html_content=self.fetcher.content,
	append_pretty_line_formatting=not watch.is_source_type_url,
	is_rss=stream_content_type.is_rss)
	else:
	html_content += html_tools.include_filters(include_filters=filter_rule,
	html_content=self.fetcher.content,
	append_pretty_line_formatting=not watch.is_source_type_url)

	if not html_content.strip():
	raise FilterNotFoundInResponse(msg=include_filters_rule, screenshot=self.fetcher.screenshot, xpath_data=self.fetcher.xpath_data)

	for filter_rule in self.filter_config.include_filters:
	# XPath filters
	if filter_rule[0] == '/' or filter_rule.startswith('xpath:'):
	filtered_content += html_tools.xpath_filter(
	xpath_filter=filter_rule.replace('xpath:', ''),
	html_content=content,
	append_pretty_line_formatting=not self.watch.is_source_type_url,
	is_rss=stream_content_type.is_rss
	)

	# XPath1 filters (first match only)
	elif filter_rule.startswith('xpath1:'):
	filtered_content += html_tools.xpath1_filter(
	xpath_filter=filter_rule.replace('xpath1:', ''),
	html_content=content,
	append_pretty_line_formatting=not self.watch.is_source_type_url,
	is_rss=stream_content_type.is_rss
	)

	# JSON filters
	elif any(filter_rule.startswith(prefix) for prefix in json_filter_prefixes):
	filtered_content += html_tools.extract_json_as_string(
	content=content,
	json_filter=filter_rule
	)

	# CSS selectors, default fallback
	else:
	filtered_content += html_tools.include_filters(
	include_filters=filter_rule,
	html_content=content,
	append_pretty_line_formatting=not self.watch.is_source_type_url
	)

	# Raise error if filter returned nothing
	if not filtered_content.strip():
	raise FilterNotFoundInResponse(
	msg=self.filter_config.include_filters,
	screenshot=self.fetcher.screenshot,
	xpath_data=self.fetcher.xpath_data
	)

Uh oh!

Fix error handling for first empty filter response #3516

Fix error handling for first empty filter response #3516

Conversation

ReggX commented Oct 17, 2025

Uh oh!

dgtlmoon commented Oct 18, 2025

Uh oh!

ReggX commented Oct 18, 2025

Uh oh!

dgtlmoon commented Oct 18, 2025

Uh oh!

dgtlmoon commented Oct 18, 2025

Uh oh!

ReggX commented Oct 18, 2025

Uh oh!

Uh oh!

dgtlmoon commented Oct 19, 2025

Uh oh!

Uh oh!

dgtlmoon commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants