Closed
Description
🚀 The feature, motivation and pitch
Description:
The current OCR pipeline is unable to accurately detect and classify the states of checkboxes in scanned or photographed document forms. This affects data extraction quality where form checkboxes are used to capture user selections.
Expected Functionality:
OCR should:
- Detect all checkbox elements in the form
- Classify checkbox states:
- Empty: Rectangular border with empty interior
- Checked: Contains ✓, or “v” shape
- Crossed: Contains “x” or diagonal line(s)
- Filled: Darkened or shaded interior
- Partial: Unclear or incomplete mark
- Associate checkbox rows/columns with corresponding labels
Steps taken:
- Tried prompt-engineering around layout inference and checkbox keyword detection – unsuccessful
- OCR returns text only, ignoring graphic elements entirely.
Sample file tried
Alternatives
No response
Additional context
No response
Activity
aman-17 commentedon Jul 10, 2025
Hey @likhithkumar98, thanks for sharing this. We’ll work on all these issues and release a better model soon.