fix(sd): update extract_from_text #1293

grossir · 2025-01-10T01:19:04Z

Now parsing: disposition, docket_number and judges

Solves #1292 Now parsing: disposition, docket_number and judges

flooie · 2025-01-16T17:02:44Z

juriscraper/opinions/united_states/state/sd.py

+        "aff in pt, vacate, & rem in pt": "Affirm in part, vacate and remand in part",
+        "aff in pt & vacate": "Affirm and vacate",  # https://www.courtlistener.com/opinion/9502826/state-v-scott/pdf/


We may need to fine tune these dispositions.

I can see this one
aff in pt & vacate": "Affirm and vacate"
should be
aff in pt & vacate": "Affirm in part and vacate"

Any other?

I would suggested the past tense. Dismiss should match Affirmed. Dismissed. Reversed and Remanded, ... etc.

Also - Aff in pt & vacate should be Affirmed in Part and Vacated

flooie

The current regex pattern for citations is too restrictive. It seems to require a citation for processing, causing cases without one to be skipped. This should be loosened to ensure that all relevant data is captured, even when a citation is missing.

The back scraper and extract_from_text method only work with PDFs from 2005 onward, as that’s when the court transitioned to a new format. Before 2005, the extraction fails due to a change in text patterns.

Should add HTML cleanup code as well.

The regular backscraper is likely to fail starting around 2009-2010 because of overly strict regex constraints. Adjusting the pattern to accommodate format variations would improve reliability.

overall I liked what you did.

flooie · 2025-01-23T16:08:32Z

@grossir this is still with you right?

grossir · 2025-01-23T16:13:33Z

@flooie yes, I still have to get back to this

grossir · 2025-04-08T17:08:47Z

I didn't implement logic for the HTML pages before 20056 since we have that data and it would complicate the scraper logic for little gain

flooie

Lots of work, thanks @grossir

fix(sd): update extract_from_text

466718d

Solves #1292 Now parsing: disposition, docket_number and judges

grossir requested a review from flooie January 10, 2025 01:19

grossir self-assigned this Jan 10, 2025

Merge branch 'main' into fix_sd_extract_from_text

b560953

flooie reviewed Jan 16, 2025

View reviewed changes

flooie requested changes Jan 16, 2025

View reviewed changes

Merge branch 'main' into fix_sd_extract_from_text

43d1dd1

flooie moved this from Buffer Zone to PRs to Review in Case Law Sprint Mar 24, 2025

Merge branch 'main' into fix_sd_extract_from_text

d3aecc3

grossir force-pushed the fix_sd_extract_from_text branch from 5712f7b to d680a1a Compare April 8, 2025 16:49

fix(sd): update judge and disposition mappers

e94da40

grossir force-pushed the fix_sd_extract_from_text branch from d680a1a to e94da40 Compare April 8, 2025 16:57

grossir requested a review from flooie April 8, 2025 17:05

grossir assigned flooie and unassigned grossir Apr 8, 2025

Merge branch 'main' into fix_sd_extract_from_text

28b599d

flooie approved these changes Apr 18, 2025

View reviewed changes

flooie merged commit 9f022ce into main Apr 18, 2025
13 checks passed

flooie deleted the fix_sd_extract_from_text branch April 18, 2025 14:51

github-project-automation bot moved this from PRs to Review to Done in Case Law Sprint Apr 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(sd): update extract_from_text #1293

fix(sd): update extract_from_text #1293

Uh oh!

grossir commented Jan 10, 2025

Uh oh!

flooie Jan 16, 2025

Uh oh!

grossir Jan 16, 2025

Uh oh!

flooie Jan 16, 2025

Uh oh!

flooie left a comment

Uh oh!

flooie commented Jan 23, 2025

Uh oh!

grossir commented Jan 23, 2025

Uh oh!

grossir commented Apr 8, 2025

Uh oh!

flooie left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		"aff in pt, vacate, & rem in pt": "Affirm in part, vacate and remand in part",
		"aff in pt & vacate": "Affirm and vacate", # https://www.courtlistener.com/opinion/9502826/state-v-scott/pdf/

Uh oh!

fix(sd): update extract_from_text #1293

fix(sd): update extract_from_text #1293

Uh oh!

Conversation

grossir commented Jan 10, 2025

Uh oh!

flooie Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

grossir Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

flooie Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

flooie left a comment

Choose a reason for hiding this comment

Uh oh!

flooie commented Jan 23, 2025

Uh oh!

grossir commented Jan 23, 2025

Uh oh!

grossir commented Apr 8, 2025

Uh oh!

flooie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants