Xtractor

The Xtractor allows you to easily extract tables from a pdf. This is essentially a wrapper for the very versatile PDF Plumber module for users who want to be able to handle this on the command line.

Installing dependencies

git clone <repo>
cd xtractor
python3 -m venv venv
source venv/bin/activate
python3 pip install -r requirements.txt

The above code sets up a virtual environment and installs the dependencies.

Running the program

To run this repo in the activated virtual environment:

python3 xtractor.py "data/bhs_buidlings.pdf" --dfr

While the program is working, you'll see a progress bar in the terminal. This will output a .csv in the out folder.

Voilà! You now have a .csv of data to do with what you please.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
out		out
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
xtractor.py		xtractor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Xtractor

Installing dependencies

Running the program

About

Uh oh!

Releases 1

Packages

Languages

License

benjamingoodheart/xtractor

Folders and files

Latest commit

History

Repository files navigation

Xtractor

Installing dependencies

Running the program

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages