feat(OpinionSite): return "lower_court_id" field #1434

grossir · 2025-06-11T19:56:52Z

This new field will go into "Docket.appeal_from_id"

Also, make tex scraper return "lower_court_id"

Solves #1432 This new field will go into "Docket.appeal_from_id" Also, make `tex` scraper return "lower_court_id"

flooie · 2025-06-16T14:14:56Z

I think we need to fix the Texas scraper with respect to IDs before we move this forward, perhaps we should make this a draft

flooie · 2025-06-16T16:17:28Z

juriscraper/opinions/united_states/state/tex.py

@@ -162,16 +163,21 @@ def parse_lower_court_info(title: str) -> tuple[str, str]:
        if match := re.search(texapp_regex, title):
            lower_court = match.group("lower_court")
            lower_court_number = title[match.end() :].split(",")[0]


docstrings on this function are not correct.

also Instead of not returning BODA id I created one in courts-db txboda

flooie · 2025-06-16T16:37:13Z

@grossir I find this


    @staticmethod
    def parse_lower_court_info(title: str) -> tuple[str, str]:
        """Parses lower court information from the title string

        :param title string
        :return lower_court, lower_court_number
        """

        # format when appeal comes from texapp. Example:
        # ' from Harris County; 1st Court of Appeals District (01-22-00182-CV, 699 SW3d 20, 03-23-23)'
        texapp_regex = r" from (?P<lower_court>.*)\s*\("

        # Examples:
        #  "(U.S. Fifth Circuit 23-10804)"
        #  "(U.S. 5th Circuit 19-51012)"
        # "(BODA Cause No. 67623)"
        other_courts_regex = r"\((?P<lower_court>(BODA|U\.S\. (Fif|5)th Circuit))\s(?P<lower_number>(Cause No. )?[\d-]+)\)$"

        if match := re.search(texapp_regex, title):
            lower_court = match.group("lower_court")
            lower_court_number = title[match.end() :].split(",")[0]
            return lower_court, lower_court_number, "texapp"
        elif match := re.search(other_courts_regex, title):
            lower_court = match.group("lower_court")
            lower_court_number = match.group("lower_number")

            if lower_court == "BODA":
                lower_court = "Board of Disciplinary Appeals"
                lower_court_id = ""
            else:
                # if this is not a BODA match, then it can only be a
                # Fifth Circuit match. Update this if the regex above changes
                lower_court_id = "ca5"

            return lower_court, lower_court_number, lower_court_id
        return "", "", ""

to be problematic. Can we return it to just return the lower court number and extract out the remaining data from extract from text.


    def extract_from_text(self, scraped_text: str) -> dict:
        """"""
        match = re.split(r"═{15,}", scraped_text)
        court_id = ""
        metadata = {"Docket": {}}
        if not match:
            return metadata
        lower_court = match[1].replace("On Petition for Review from the", "").strip()
        if lower_court.startswith("Court of Appeals"):
            court_id = "texapp"
        elif lower_court.startswith("Board of Disciplinary Appeals"):
            court_id = "txboda"
        elif lower_court.startswith("United States Court of Appeals for the Fifth Circuit"):
            court_id = "ca5"
        if court_id != "":
            metadata['Docket']['lower_court_str'] = lower_court
            metadata['Docket']['lower_court_id'] = court_id
        return metadata

I think I like the way the courts names are written here - they match and look much nicer to me.

grossir · 2025-06-17T15:38:57Z

@flooie

I was checking the PDFs and I would keep the data from the HTML source, because it has:

lower complexity: for example, on the PDF the separator is not always "On Petition for Review from the", I have also found "On Certified Question from the", and there may be other variations to account for
more information: the "lower_court_str" also mentions the county it's coming from; not only the district

About the formatting being prettier or more standard in the PDF, when we implement the frontend we will just use the "appeal_from_id", which links to a Court object which has the standard court name; so I don't think a standard name should matter too much for "lower_court_str" / "appeal_from_str"

flooie · 2025-06-17T16:36:21Z

@grossir I think the HTML is providing a non standard name for the court and I much prefer the format from the PDF.

let me take a look at a bigger sample

grossir added this to Case Law Sprint Jun 11, 2025

grossir moved this to PRs to Review in Case Law Sprint Jun 11, 2025

feat(OpinionSite): return "lower_court_id" field

716ec9d

Solves #1432 This new field will go into "Docket.appeal_from_id" Also, make `tex` scraper return "lower_court_id"

grossir force-pushed the 1432-opinion-site-return-lower-court-id branch from abe7443 to 716ec9d Compare June 11, 2025 19:58

grossir assigned flooie Jun 11, 2025

Merge branch 'main' into 1432-opinion-site-return-lower-court-id

8dcc162

grossir mentioned this pull request Jun 12, 2025

Implement scraper for "Texas 15th Court of Appeals" texapp_15 #1436

Open

flooie moved this from PRs to Review to Waiting on Feedback in Case Law Sprint Jun 16, 2025

flooie assigned grossir and unassigned flooie Jun 16, 2025

grossir mentioned this pull request Jun 16, 2025

Fix Texas scrapers ids #1444

Open

grossir moved this from Waiting on Feedback to Blocked in Case Law Sprint Jun 16, 2025

flooie reviewed Jun 16, 2025

View reviewed changes

Merge branch 'main' into 1432-opinion-site-return-lower-court-id

b527381

grossir moved this from Blocked to Waiting on Feedback in Case Law Sprint Jun 17, 2025

grossir assigned flooie and unassigned grossir Jun 17, 2025

fix(parse_lower_court_info): add "txboda" court id value

dc6f744

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(OpinionSite): return "lower_court_id" field #1434

feat(OpinionSite): return "lower_court_id" field #1434

Uh oh!

grossir commented Jun 11, 2025

Uh oh!

flooie commented Jun 16, 2025

Uh oh!

flooie Jun 16, 2025

Uh oh!

flooie commented Jun 16, 2025

Uh oh!

grossir commented Jun 17, 2025 •

edited

Loading

Uh oh!

flooie commented Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

feat(OpinionSite): return "lower_court_id" field #1434

Are you sure you want to change the base?

feat(OpinionSite): return "lower_court_id" field #1434

Uh oh!

Conversation

grossir commented Jun 11, 2025

Uh oh!

flooie commented Jun 16, 2025

Uh oh!

flooie Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

flooie commented Jun 16, 2025

Uh oh!

grossir commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flooie commented Jun 17, 2025

Uh oh!

Uh oh!

grossir commented Jun 17, 2025 •

edited

Loading