New Source Collector: "research" ML model to generate batches

## Context

I have been talking recently with folks who use `chatgpt o3 mini high` to **find actual URLs on the internet** (not generate them) which have sources for their mathematical research paper. While I'm still suspicious of closed-source LLMs (and LLMs in general), these things are subsidized and cheap and will soon get too expensive...shouldn't we be taking advantage while we can?

I think part of what they are solving for is that googling is awful—which is what we are accounting for with our labeling pipeline. They are probably going to generate much less junk than our other collection methods, but we will still be protected from junk by human labelers.

## Requirements

- [ ] Create a source collector based on a `research` ML model—not for doing research, but for finding sources.
  - [ ] try `o3 mini high` first
  - [ ] consider that other models may be tried in the future, but will work the same way; can we make them operate from the same `collector` using options? should we just make new collectors for other models?
- [ ] The collector should accept a `prompt` and give the user some guidance about what a prompt might look like. It'll probably look different from a google search.
- [ ] The collector should generate URLs like any other.

## Thoughts

If this works well, we might consider using LLMs to sort more aggressively on `relevancy`, making the human part of labeling more fun and less subjective.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Source Collector: "research" ML model to generate batches #299

Context

Requirements

Thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New Source Collector: "research" ML model to generate batches #299

Description

Context

Requirements

Thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions