GitHub - Unstructured-IO/pipeline-document-layout: Pipeline for layout extraction

Pre-Processing Pipeline for Layout Detection

The description for the pipeline repository goes here. The API is hosted at https://api.unstructured.io.

Developer Quick Start

Using pyenv to manage virtualenv's is recommended
- Mac install instructions. See here for more detailed instructions.
  - brew install pyenv-virtualenv
  - pyenv install 3.8.15
- Linux instructions are available here.
- Create a virtualenv to work in and activate it, e.g. for one named document_layout:
  
  pyenv virtualenv 3.8.15 document_layout
  pyenv activate document_layout
Run make install
Run pip install 'git+https://github.com/facebookresearch/[email protected]#egg=detectron2'
Start a local jupyter notebook server with make run-jupyter
OR
just start the fast-API locally with make run-web-app

Extracting whatever from some type of document

For example:

curl -X 'POST' \
  'http://localhost:8000/document-layout/v1.0.0/layout' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'files=@sample-docs/example.png' -F 'model_type=yolox'| jq -C . | less -R

Where files includes the file to process, model_type can be 'default' (or blank) or 'yolox', also is possible to use force_ocr to auto in order to try text extraction from your file, or 'true', in which case OCR will be used.

Generating Python files from the pipeline notebooks

You can generate the FastAPI APIs from your pipeline notebooks by running make generate-api.

Security Policy

See our security policy for information on how to report security vulnerabilities.

Learn more

Section	Description
Unstructured Community Github	Information about Unstructured.io community projects
Unstructured Github	Unstructured.io open source repositories
Company Website	Unstructured.io product and company info

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
exploration-notebooks		exploration-notebooks
img		img
pipeline-notebooks		pipeline-notebooks
prepline_document_layout		prepline_document_layout
requirements		requirements
sample-docs		sample-docs
scripts		scripts
test_document_layout/api		test_document_layout/api
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
preprocessing-pipeline-family.yaml		preprocessing-pipeline-family.yaml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pre-Processing Pipeline for Layout Detection

Developer Quick Start

Extracting whatever from some type of document

Generating Python files from the pipeline notebooks

Security Policy

Learn more

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

Unstructured-IO/pipeline-document-layout

Folders and files

Latest commit

History

Repository files navigation

Pre-Processing Pipeline for Layout Detection

Developer Quick Start

Extracting whatever from some type of document

Generating Python files from the pipeline notebooks

Security Policy

Learn more

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages