This repo implements a document pre-processing pipeline for receipts. Currently, the pipeline is under development. The pipeline assumes the receipts are in PDF or image formats (JPG, PNG).
The API is hosted at https://api.unstructured.io
.
-
Using
pyenv
to manage virtualenv's is recommended- Mac install instructions:
brew install pyenv-virtualenv
pyenv install 3.8.15
Create a virtualenv to work in and activate it, e.g. for one named
receipts
:pyenv virtualenv 3.8.15 receipts
pyenv activate receipts
- Mac install instructions:
-
Run
make install
-
Start a local jupyter notebook server with
make run-jupyter
OR
just start the fast-API locally withmake run-web-app
After API starts, you can extract the elements of Receipt files with the command:
curl -X 'POST' \
'http://localhost:8000/receipts/v0.1.0/receipts' \
-F 'files=@<your_receipt_file>' \
| jq -C . | less -R
You can generate the FastAPI APIs from your pipeline notebooks by running make generate-api
.
See our security policy for information on how to report security vulnerabilities.
Hugging Face Spaces offer a simple way to host ML demo apps, models and datasets directly on our organization’s profile. This allows us to showcase our projects and work collaboratively with other people in the ML ecosystem. Visit our space here!
Section | Description |
---|---|
Company Website | Unstructured.io product and company info |
Fine-tuned Models and Data | CORD Consolidated Receipt dataset and Donut model |