GitHub

install.sh is for Debian Linux Prerequisites: apt and python above 3.8

The project includes 3 main parts:
PDF Text Extractor - extracts text from PDF
Image Extractor from PDF - extracts images and saves it to a folder Text Visualizer - Visualize the text to see what the computer recognizes

If on debian linux do

Sudo bash install.sh

Steps:

Install tesseract-ocr and libtesseract-dev using your os package installed
Create a virual env python3 -m venv venv
source venv/bin/activate
Install all libraries required pip install -r requirments.txt

Depending on your work load either use main.py if you want a graphical interface or maincli.py to use command line argumets

For mainCLI.py you can use either syntax
python3 main.py PDFfile
or
python3 main.py PDFfile -o outputFileName

For visualizer.py the syntax is
python3 visualizer.py PDFfile

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.gitignore		.gitignore
README.MD		README.MD
install.sh		install.sh
pdf2text.zip		pdf2text.zip
requirments.txt		requirments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sudo bash install.sh

About

Releases 1

Packages

Languages

NLPatVCU/PDFtoTextExtractor

Folders and files

Latest commit

History

Repository files navigation

Sudo bash install.sh

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages