Herbarium processor

fka the Lichen Digital Analysis & Data Delivery sYstem

Summary

Given an image of a specimen label, this library will use computer vision and AI to return the formatted data of that label, e.g.

Input

Output

        {
          "Catalog #": "245081",
          "Occurrence ID": null,
          "Taxon": "Acarospora strigata (Nyl.) Jatta",
          "Family": null,
          "Determiner": null,
          "Date Determined": null,
          "Collector": "H. E. HASSE",
          "Number": "1327",
          "Date": null,
          "Verbatim Date": null,
          "Locality": "Palm Springs (Type locality) Riverside Co. Cal",
          "Latitude/Longitude": null,
          "Elevation": null,
          "Verbatim Elevation": null,
          "Habitat": null
        }

Quick setup

git clone the repo
Obtain a Google API key
In the cloned repo, add a .env file with the following content:
```
GOOGLE_API_KEY=your_key_here
```
Run pip install -r requirements.txt in the terminal
Run the Jupyter notebook. The first cell contains the image that is being processed, if you want to test a different image update the value of that variable.
Check out the json response at the end of the notebook, or in the tmp/ directory.

How it works

This is currently being developed. Given an image of a specimen label, it:

Uses an image-to-text i.e. OCR service to extract the text from the label ** note**: Calling the OCR is not yet implemeneted, I just hardcoded the json responses from Google Cloud Vision API into /json/ to shim this step.
Cleans up the OCR response to be much smaller in order to prep it for passing it to the model (in this case, Gemini 2.5 Pro)
Drafts up system instructions TODO: Fine-tune this prompt if needed
Asks the AI agent to fill out the herbarium fields.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
img		img
json		json
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Herbarium processor

Summary

Quick setup

How it works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

lisunshiny/herbarium-processor

Folders and files

Latest commit

History

Repository files navigation

Herbarium processor

Summary

Quick setup

How it works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages