Skip to content

feat: add error handling for scrapers with expected results #1449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Luis-manzur
Copy link
Contributor

This pull request introduces error handling for scrapers that are expected to return results but fail to do so. The changes include updates to the CHANGES.md file to document the new feature, as well as modifications to the AbstractSite class in juriscraper to implement the functionality.

Documentation Updates:

  • CHANGES.md: Added a note under "Features" about the new error handling for scrapers with expected results.

Code Enhancements:

  • juriscraper/AbstractSite.py:
    • Added a new should_have_results attribute in the __init__ method to indicate whether a scraper is expected to return results.
    • Updated the _check_sanity method to log an error if should_have_results is True and no results are returned, while maintaining a warning for cases where results are not required.

Copy link
Contributor

@grossir grossir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to be useful you will have to go through the scrapers one by one and identify those that "should_have_results" and set that attribute to true

@Luis-manzur
Copy link
Contributor Author

For this to be useful you will have to go through the scrapers one by one and identify those that "should_have_results" and set that attribute to true

I was wondering if we could do separate issues to not overload this PR.

@Luis-manzur Luis-manzur moved this to PRs to Review in Case Law Sprint Jun 17, 2025
@flooie
Copy link
Contributor

flooie commented Jun 17, 2025

I think I agree with @Luis-manzur that adding this field should be a separate PR.

@grossir
Copy link
Contributor

grossir commented Jun 17, 2025

I'd prefer all the changes to be together for the reasons below, but feel free to approve it and merge it as is

  • the PR is not cluttered as it is right now, it's a few lines in a single file
  • changes don't really affect anything without changing the relevant scraper files, so there is not much to review here
  • you will need to open, link to the issue (or another one), and review another PR instead of doing it now, which is more clerical work

…etect failing scrapers when no results are found
…n-no-results-are-found' into 1447-detect-failing-scrapers-when-no-results-are-found
@flooie
Copy link
Contributor

flooie commented Jun 23, 2025

Howdid you find which ones to update?

@flooie flooie assigned Luis-manzur and unassigned flooie Jul 2, 2025
@flooie
Copy link
Contributor

flooie commented Jul 2, 2025

@Luis-manzur can you resolve conflicts and respond to my question?

@Luis-manzur
Copy link
Contributor Author

to Identify the sites that needed this update I looked up inside the code of each one looking that the there were no filtering before or after the first request, and confirmed going into the court page. also I left outside sites that the court page don't need any filtering but they clear the opinion list each month/year.

…-are-found

# Conflicts:
#	CHANGES.md
#	juriscraper/opinions/united_states/state/tenn.py
@Luis-manzur Luis-manzur assigned flooie and unassigned Luis-manzur Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: PRs to Review
Development

Successfully merging this pull request may close these issues.

Detect failing scrapers when no results are found
3 participants