Improve NY Trial Court Scraping Logic to Handle Backdated Opinions

NY Trial courts often post their opinions well after the official publish date. This creates a challenge in our current scraping logic, which stops fetching once it encounters a set number of already-seen opinions.

### What is happening

Opinions on the page are sorted chronologically by publish date.
Our scraper stops once it sees ~5–10 opinions it has already indexed (to avoid redundant requests and be a respectful scraper).
When a newly posted opinion is backdated (e.g., posted in April but dated February), it can fall far enough down the list that it’s missed entirely.

### What can we do?

Add logic to detect and continue fetching beyond the seen-opinion threshold if newer HTML content is being served (e.g., based on publication date vs scrape timestamp).
Perhaps we should add a flag here in juriscraper to run a full crawl all the time?  Or we should limit how much is identified.  If we pulled less back perhaps it wouldnt be an issue 

@grossir thoughts?  



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve NY Trial Court Scraping Logic to Handle Backdated Opinions #1361

What is happening

What can we do?

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve NY Trial Court Scraping Logic to Handle Backdated Opinions #1361

Description

What is happening

What can we do?

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions