Skip to content

feat(lactapp_2): new scraper for Lousiana Court of Appeals Second Circuit #1299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
b56a955
scraper for lousiana 2nd circuit
giancohs Jan 11, 2025
75ce2b6
scraper for lousiana 2nd circuit
giancohs Jan 11, 2025
6fda2fc
fixing lousiana 2nd circuit backscrape
giancohs Jan 11, 2025
57abc9b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 11, 2025
3c1ce77
address PR comments: map court abbreviations and refactor code
giancohs Jan 13, 2025
aade79e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 13, 2025
d2989e8
update court names mapping and simplify scraper code
giancohs Jan 13, 2025
1c8ba33
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 13, 2025
3d85f9c
delete unused imports
giancohs Jan 13, 2025
680fb16
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 13, 2025
e07ae7b
Add comment with general info about the scraper
giancohs Apr 10, 2025
9c38cf6
Refactor/fix backscraper
giancohs Apr 10, 2025
5889503
Add extract_from_text to get court name from pdf
giancohs Apr 10, 2025
ffc687a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 10, 2025
42ffdef
Add lactapp_2 compare test
giancohs Apr 11, 2025
143c227
Add example test in ScraperExtractFromTextTest
giancohs Apr 11, 2025
2e58de2
Changelog update about new feature/scraper lactapp_2: Lousiana Court …
giancohs Apr 11, 2025
0111cf7
Merge branch 'main' into 1296-implement-2nd-circuit-lousiana
giancohs Apr 11, 2025
7dc0b88
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 11, 2025
76dabe2
restore original .gitignore
giancohs Apr 17, 2025
a24315f
code cleanup for unused imports and unnecessary/redundant lines, add …
giancohs Apr 20, 2025
583e178
add test output json file for lactapp_2
giancohs Apr 20, 2025
160b4f8
Update lactapp_2 extract_from_text test for judges
giancohs Apr 20, 2025
7f805f3
Merge branch 'main' into 1296-implement-2nd-circuit-lousiana
giancohs Apr 20, 2025
effeca3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 20, 2025
a94195f
Remove _download method
giancohs Apr 21, 2025
cc7e768
remove unnecessary condition and print
giancohs Apr 21, 2025
d26e537
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 21, 2025
8d42470
Merge branch 'main' into 1296-implement-2nd-circuit-lousiana
flooie May 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,9 @@ tests/fixtures/cassettes/
### Other ###
# File created by Mac OS X
.DS_Store
# Devcontainer folder
.devcontainer/

# Swap files
*.swp
*~
*~
1 change: 1 addition & 0 deletions juriscraper/opinions/united_states/state/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@
"kyctapp",
"la",
"lactapp_1",
"lactapp_2",
"lactapp_5",
"mass",
"massappct",
Expand Down
92 changes: 92 additions & 0 deletions juriscraper/opinions/united_states/state/lactapp_2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
from datetime import date, datetime

from juriscraper.AbstractSite import logger
from juriscraper.lib.date_utils import unique_year_month
from juriscraper.lib.html_utils import (
get_row_column_links,
get_row_column_text,
)
from juriscraper.OpinionSiteLinear import OpinionSiteLinear


class Site(OpinionSiteLinear):
first_opinion_date = datetime(2019, 7, 17)
days_interval = 28 # Monthly interval

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.court_id = self.__module__
self.base_url = "https://www.la2nd.org/opinions/"
self.year = datetime.now().year
self.url = f"{self.base_url}?opinion_year={self.year}"
self.cases = []
self.status = "Published"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status is not always published. so you cant just assign it. There are two opinions in 2024 that do not share that distinction.

Thankfully you can just use

status_str = get_row_column_text(row, 7) to get that status_str

and do something like this

        status_str = get_row_column_text(row, 7)
        status = "Published" if "Published" in status_str else "Unpublished"

to get status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line should be removed now that you updated it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, my bad—I cleaned that up in my last push

self.target_date = None
self.make_backscrape_iterable(kwargs)

def _download(self):
html = super()._download()
if html is not None:
tables = html.cssselect("table#datatable")
if not tables or not tables[0].cssselect("tbody tr"):
self.year -= 1
self.url = f"{self.base_url}?opinion_year={self.year}"
return self._download()
return html

def _process_html(self):
if self.html is None:
return

tables = self.html.cssselect("table#datatable")
if tables and tables[0].cssselect("tbody tr"):
logger.info(f"Processing cases for year: {self.year}")
for row in tables[0].cssselect("tbody tr"):
case_date = datetime.strptime(
get_row_column_text(row, 1), "%m/%d/%Y"
).date()

# Skip if before first opinion date
if case_date < self.first_opinion_date.date():
continue

# Only apply date filtering during backscrape
if (
hasattr(self, "back_scrape_iterable")
and self.back_scrape_iterable
):
if self.target_date:
target_month = self.target_date.month
target_year = self.target_date.year
if (
case_date.year != target_year
or case_date.month != target_month
):
continue

self.cases.append(
{
"date": get_row_column_text(row, 1),
"docket": get_row_column_text(row, 2),
"name": get_row_column_text(row, 3),
"author": get_row_column_text(row, 4),
"disposition": get_row_column_text(row, 5),
"lower_court": get_row_column_text(row, 6),
"summary": get_row_column_text(row, 7),
"url": get_row_column_links(row, 8),
}
)

def _download_backwards(self, target_date: date) -> None:
logger.info(f"Backscraping for date: {target_date}")
self.target_date = target_date
self.year = target_date.year
self.url = f"{self.base_url}?opinion_year={self.year}"
self.html = self._download()
self._process_html()

def make_backscrape_iterable(self, kwargs):
super().make_backscrape_iterable(kwargs)
self.back_scrape_iterable = unique_year_month(
self.back_scrape_iterable
)
Loading
Loading