-
-
Notifications
You must be signed in to change notification settings - Fork 128
937 alaska and alaskactapp missing opinions #1476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
if not self.test_mode_enabled(): | ||
# If there is no link in the first column, find it inside the case page | ||
url = self.retrieve_pdf_from_alternate_page(row) | ||
if not url: | ||
continue | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code doesnt work. The continue on 63 makes it so the secondary fetch never gets used but does the secondary fetch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think we need a placeholder value for the test page here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scraper by default scrapes the entire page. way too much. we should limit it to the last 30 days, and the backscraper can handle things prior.
It would be no wonder if this is failing as we are hitting hundreds of secondary pages every time this crawler is used. lets tighten this up and make it much nicer otherwise I think we will get blocked soon
looks like we just skip all the ones with citations and they never get added in the current version. so im not going to disable it.
also - I think we have two other pages that should be scraped as well https://appellate-records.courts.alaska.gov/CMSPublic/Home/AppellateOpinions lists three pages Memorandum opinions I think. Published orders and our slip opinions that we currently collect. |
I'll create a new issue for this.
|
…ing URLs and date range checks
Everything should be done except the disposition |
This pull request enhances the functionality of the
alaska
scraper by adding a fallback mechanism to retrieve PDF download URLs from an alternate case page when they are not available in the main table. Additionally, theCHANGES.md
file has been updated to document this improvement.