-
-
Notifications
You must be signed in to change notification settings - Fork 128
feat: add error handling for scrapers with expected results #1449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add error handling for scrapers with expected results #1449
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this to be useful you will have to go through the scrapers one by one and identify those that "should_have_results" and set that attribute to true
I was wondering if we could do separate issues to not overload this PR. |
I think I agree with @Luis-manzur that adding this field should be a separate PR. |
I'd prefer all the changes to be together for the reasons below, but feel free to approve it and merge it as is
|
…etect failing scrapers when no results are found
…n-no-results-are-found' into 1447-detect-failing-scrapers-when-no-results-are-found
Howdid you find which ones to update? |
…g scrapers when no results are found
…failing scrapers when no results are found
@Luis-manzur can you resolve conflicts and respond to my question? |
to Identify the sites that needed this update I looked up inside the code of each one looking that the there were no filtering before or after the first request, and confirmed going into the court page. also I left outside sites that the court page don't need any filtering but they clear the opinion list each month/year. |
…-are-found # Conflicts: # CHANGES.md # juriscraper/opinions/united_states/state/tenn.py
…-are-found # Conflicts: # CHANGES.md
This pull request introduces error handling for scrapers that are expected to return results but fail to do so. The changes include updates to the
CHANGES.md
file to document the new feature, as well as modifications to theAbstractSite
class injuriscraper
to implement the functionality.Documentation Updates:
CHANGES.md
: Added a note under "Features" about the new error handling for scrapers with expected results.Code Enhancements:
juriscraper/AbstractSite.py
:should_have_results
attribute in the__init__
method to indicate whether a scraper is expected to return results._check_sanity
method to log an error ifshould_have_results
isTrue
and no results are returned, while maintaining a warning for cases where results are not required.