Skip to content

Conversation

ReggX
Copy link
Contributor

@ReggX ReggX commented Oct 17, 2025

Recent refactor (Commit 3b9d19d, issue #3475) introduced a regression in helper function apply_include_filters() due to indentation level being higher than previous code. This regression led to filters throwing an exception if first filter in list has no match even if later filters would have matched.

This PR aims to fix this regression by bringing the indentation level back in line with the for loop instead of inside the loop.

Original code with lower indentation level:

for filter_rule in include_filters_rule:
# For HTML/XML we offer xpath as an option, just start a regular xPath "/.."
if filter_rule[0] == '/' or filter_rule.startswith('xpath:'):
html_content += html_tools.xpath_filter(xpath_filter=filter_rule.replace('xpath:', ''),
html_content=self.fetcher.content,
append_pretty_line_formatting=not watch.is_source_type_url,
is_rss=stream_content_type.is_rss)
elif filter_rule.startswith('xpath1:'):
html_content += html_tools.xpath1_filter(xpath_filter=filter_rule.replace('xpath1:', ''),
html_content=self.fetcher.content,
append_pretty_line_formatting=not watch.is_source_type_url,
is_rss=stream_content_type.is_rss)
else:
html_content += html_tools.include_filters(include_filters=filter_rule,
html_content=self.fetcher.content,
append_pretty_line_formatting=not watch.is_source_type_url)
if not html_content.strip():
raise FilterNotFoundInResponse(msg=include_filters_rule, screenshot=self.fetcher.screenshot, xpath_data=self.fetcher.xpath_data)

Refactored code with higher indentation level:

for filter_rule in self.filter_config.include_filters:
# XPath filters
if filter_rule[0] == '/' or filter_rule.startswith('xpath:'):
filtered_content += html_tools.xpath_filter(
xpath_filter=filter_rule.replace('xpath:', ''),
html_content=content,
append_pretty_line_formatting=not self.watch.is_source_type_url,
is_rss=stream_content_type.is_rss
)
# XPath1 filters (first match only)
elif filter_rule.startswith('xpath1:'):
filtered_content += html_tools.xpath1_filter(
xpath_filter=filter_rule.replace('xpath1:', ''),
html_content=content,
append_pretty_line_formatting=not self.watch.is_source_type_url,
is_rss=stream_content_type.is_rss
)
# JSON filters
elif any(filter_rule.startswith(prefix) for prefix in json_filter_prefixes):
filtered_content += html_tools.extract_json_as_string(
content=content,
json_filter=filter_rule
)
# CSS selectors, default fallback
else:
filtered_content += html_tools.include_filters(
include_filters=filter_rule,
html_content=content,
append_pretty_line_formatting=not self.watch.is_source_type_url
)
# Raise error if filter returned nothing
if not filtered_content.strip():
raise FilterNotFoundInResponse(
msg=self.filter_config.include_filters,
screenshot=self.fetcher.screenshot,
xpath_data=self.fetcher.xpath_data
)

Recent refactor (Commit 3b9d19d) introduced a regression in helper function apply_include_filters() due to indentation level being higher than previous code.
This regression led to filters throwing an exception if first filter in list has no match even if later filters would have matched.
@dgtlmoon
Copy link
Owner

ah thank yoU! any chance of a test?

@ReggX
Copy link
Contributor Author

ReggX commented Oct 18, 2025

Not quite familiar with your testing methodology.

I assume it would be similar to test_filter_exist_changes, but with a non-hitting filter before .ticket-available

"include_filters": '.ticket-available',

Alternatively, if you want a unit test instead, I'm not quite sure how to setup ContentProcessor for the test (arguments aren't typed and I have found no example in existing tests)

@dgtlmoon
Copy link
Owner

yeah just read the current tests, i absolutely cant merge this in without a test or a mod to an existing test that tests for it

you could even update one of the existing simple "include filter" rules and put a filter that does not exist at the top

.somemissingfilter\n.the-tested-earlier-filter

@dgtlmoon
Copy link
Owner

only bugs now that we find are 100% where there was not a test

Updated test to include a non-matching selector.
@ReggX
Copy link
Contributor Author

ReggX commented Oct 18, 2025

Modified an existing test, but can't test locally on my Windows machine since pytest throws AttributeError: Can't pickle local object 'LiveServer.start.<locals>.worker'

But since it is an existing test, the CI workflow should cover it anyways.

@dgtlmoon
Copy link
Owner

lets see how we go, i fixed the '' to "" which should work with \n

@dgtlmoon dgtlmoon merged commit 58b5586 into dgtlmoon:master Oct 19, 2025
5 checks passed
@dgtlmoon
Copy link
Owner

Looks good! thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants