Skip to content

Our *fast* text-extraction library to extract texts from Annual Reports and other such documents.

Notifications You must be signed in to change notification settings

Mittal-Analytics/fast-pdf-extract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f7de56e · Feb 21, 2025

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fast-pdf-extract

A Rust backed PDF text extraction library for Python.

Features

  • Detect and remove headers and footers
  • Clean bilingual PDFs
  • Mark headings in bold (basic markdown)
  • High accuracy
  • Peformance

Development

uv sync --only-dev

# run tests
python -m unittest

# publishing
maturin build --release
maturin publish

About

Our *fast* text-extraction library to extract texts from Annual Reports and other such documents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published