Skip to content

Detect failing scrapers when no results are found #1447

Open
@grossir

Description

@grossir

This is possible for a subset of scrapers, which should be identified on a case by case basis. The subset is those that scrape a "Most recent page", meaning they will always return at least 1 result. If we get 0 results, we should send a logger.error() that will go to Sentry

Examples of scrapers of the subset

This should be as easy as

  • setting a default class attribute on AbstractSite.should_have_results = False
  • overriding the proper scrapers should_have_results = True
  • put a check on parse and send a logger if the conditions are met len(self.cases) == 0 and should_have_results

def parse(self):
if not self.downloader_executed:
# Run the downloader if it hasn't been run already
self.html = self._download()
# Process the available html (optional)
self._process_html()
# Set the attribute to the return value from _get_foo()
# e.g., this does self.case_names = _get_case_names()
for attr in self._all_attrs:
self.__setattr__(attr, getattr(self, f"_get_{attr}")())
self._clean_attributes()
if "case_name_shorts" in self._all_attrs:
# This needs to be done *after* _clean_attributes() has been run.
# The current architecture means this gets run twice. Once when we
# iterate over _all_attrs, and again here. It's pretty cheap though.
self.case_name_shorts = self._get_case_name_shorts()
self._post_parse()
self._check_sanity()
self._date_sort()
self._make_hash()
return self

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

PR'd Issues 🤞

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions