This application classifies uploaded documents (PDFs, DOC/DOCX, or images) as:
- Estácio Exam
- Uniasselvi Exam
- Other
It also detects checkmark boxes using a YOLO model, displaying them with green bounding boxes and showing statistics like count and average confidence.
- File Conversion: DOC/DOCX to PDF, PDF to images.
- Document Classification: Uses a LeViT model (ONNX format) to identify the type of document.
- Checkmark Detection: YOLO model identifies checkmarks with confidence thresholding and non-maximum suppression.
- Streamlit Interface: Clean UI with sidebar model selection, image previews, and detection stats.
- Python 3.10+
- Poppler for
pdf2image
- LibreOffice (for DOC/DOCX conversion)
Install dependencies (after cloning the repository):
sudo apt update
sudo apt install poppler-utils libreoffice -y
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
streamlit run app.py --server.fileWatcherType none
Access the app at http://localhost:8501
.
- Launch an EC2 instance with Ubuntu.
- Install dependencies as described above.
- Clone this repository and navigate into it.
- Start the Streamlit app with:
streamlit run app.py --server.headless true --server.port 8501 --server.enableCORS false
- Open your browser to
http://<your-ec2-public-ip>:8501
.
.
├── app.py
├── requirements.txt
└── models/
├── levit-384-exams-classification/
│ ├── onnx/levit384_estacio-multiclass.onnx
│ ├── preprocessor_config.json
│ └── metrics/metrics.png
└── yolo_detection_checkmark-final/
└── best.pt
Apache License
Developed by Lucas Meireles