-
-
Notifications
You must be signed in to change notification settings - Fork 141
fix(ala): by fetching detailed publication data from new API endpoint #1759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(ala): by fetching detailed publication data from new API endpoint #1759
Conversation
for more information, see https://pre-commit.ci
…hanged-significantly' into 1758-the-ala-api-structure-has-changed-significantly # Conflicts: # juriscraper/opinions/united_states/state/ala.py
|
Take another look at this. Your data doesnt look right to me |
flooie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at your output. it's not quite right.
| { | ||
| "publicationItemUUID":"93C503F6-7E90-40CF-9C0F-FA9C20A8A036", | ||
| "docketEntryUUID":"73CC07D7-54C4-4BBA-9973-C1F4F5E5E3A3", | ||
| "caseInstanceUUID":"F46D02FF-367A-46FF- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file doesnt match what I expect, can you update the json to match the new api endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated all three example files, but just using the second API call json to do the testing
flooie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few more things
| def _download(self, request_dict=None): | ||
| """Download the publication list and then fetch detailed publication data. | ||
| The initial API returns a list of publications, but we need to fetch | ||
| the detailed publication endpoint to get full case information. | ||
| """ | ||
| if self.test_mode_enabled(): | ||
| return super()._download(request_dict) | ||
|
|
||
| # First, get the list of publications | ||
| html = super()._download(request_dict) | ||
|
|
||
| # Get the publicationUUID from the initial response | ||
| releases = html["_embedded"]["results"] | ||
| publication_uuid = releases[0].get("publicationUUID") | ||
|
|
||
| # Processes only the first result to scrape the most recent data. | ||
| item = self.json["_embedded"]["results"][0] | ||
| # Fetch detailed publication data | ||
| self.url = f"{self.base_url}/courts/{self.court_str}/cms/publication/{publication_uuid}" | ||
| return super()._download(request_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something like this
def _download(self, request_dict=None):
"""Download the publication list and then fetch detailed publication data.
The initial API returns a list of publications, but we need to fetch
the detailed publication endpoint to get full case information.
"""
if self.test_mode_enabled():
return super()._download(request_dict)
resp = super()._download(request_dict)
releases = resp["_embedded"]["results"]
publication_uuid = releases[0].get("publicationUUID")
self.url = f"{self.base_url}/courts/{self.court_str}/cms/publication/{publication_uuid}"
self.json = super()._download(request_dict)
and drop the item = self.html since this is json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this isnt resolved
| return super()._download(request_dict) | ||
|
|
||
| # First, get the list of publications | ||
| html = super()._download(request_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should be called resp and not html since its not html
| r"\((?:Appeal from ([^:]+):\s*([^)]+)|[^;]+;\s*([^:]+Appeals):\s*([^)]+))\)", | ||
| name, | ||
| ) | ||
| if match: | ||
| lower_court = match.group("lower_court").strip() | ||
| lower_court_number = match.group("lower_court_number").strip() | ||
| # Remove the parenthetical from the name | ||
| name = name[: match.start()].rstrip() | ||
| # Groups 1,2 for "Appeal from"; groups 3,4 for Ex parte format | ||
| lower_court = (match.group(1) or match.group(3) or "").strip() | ||
| lower_court_number = ( | ||
| match.group(2) or match.group(4) or "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you strip out the group names
| name = re.sub( | ||
| r"\s*PETITION FOR WRIT OF .+?(?=\(|$)", "", name | ||
| ).strip() | ||
| name = re.sub(r"\s*\(In re:\s*.+?\)", "", name).strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes me think we are collecting things we shouldnt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont wnat to remove in re from case names but we dont want to be collecting petitions for writ of anything I think.
…hanged-significantly' into 1758-the-ala-api-structure-has-changed-significantly
for more information, see https://pre-commit.ci
Fixes #1758