ocrAI 🤖

🆕 Enhanced OCR + AI Mode: The AI-corrected text is now embedded directly into the PDF, preserving the original layout and positioning within the document.

📄 New Output Format in Full AI OCR Mode: The output is now a Markdown-formatted .txt file, structured with titles, subtitles, and page markers. This file can be converted directly within the app to a clean, paginated PDF, making it ideal for tasks like translating books while preserving formatting and pagination.

ocrAI is a simple web app that combines Optical Character Recognition (OCR) and Artificial Intelligence (AI) to process and translate documents. It offers an intuitive interface with real-time feedback and emoji progress.

Key Features

File Management 📤
- Upload PDF or image files easily.
- Files are saved with unique names to avoid overwrites.
- Delete all files with one click (removes from both uploads and outputs).
OCR Processing Modes 🔍
- OCR (Tesseract Only):
  Extracts text with Tesseract and embeds it into the PDF using OCRmyPDF. The TXT file contains the raw OCR output.
- OCR + AI (Tesseract + AI):
  Uses Tesseract to extract text, then sends it to Gemini AI for correction and formatting. The corrected text is embedded directly into the resulting PDF, and a TXT file is also generated.
- AI (Full AI OCR):
  Uses Gemini AI to process the document page by page. Generates a TXT file in markdown format (with page markers and structure). This TXT can be converted to a PDF using the TXT to PDF tool in the app.
- All modes show real-time progress with emojis and run in the background.
TXT to PDF 📝
- Convert any TXT file (especially markdown from AI mode) to a clean, paginated PDF directly from the app.
Translation 🌐
- Translate PDF or TXT documents page by page.
- Progress updates are displayed, and a TXT file with the final translation (including page markers) is generated.
Configuration ⚙️
- Manage and add new Gemini AI models and languages.
- Update or add custom prompts for OCR, correction, and translation.
- Download or upload the complete configuration (prompts and models).

How to Use the Application

Upload and Process Files:
- Go to the OCR tab.
- Select your file (PDF or image).
- Choose one of the processing modes:
  - OCR (Tesseract Only)
  - OCR + AI (Tesseract + AI for correction and AI-embedded PDF)
  - AI (Full AI OCR, generates markdown TXT)
- For AI modes, select the desired prompt.
- Click Upload and process and watch the real-time progress.
- If you used AI mode, you can convert the resulting TXT to PDF using the TXT to PDF tab.
Translate Documents:
- Go to the Translation tab.
- Upload a new file or select one from the list of processed files.
- Choose the target language and translation prompt.
- Click Translate and observe the progress as each page is processed.
- The result is saved in a TXT file with page markers.
View Processed Files:
- Go to the Processed Files tab.
- Download or delete files (with confirmation prompts).
Note: The Processed Files list is ordered from most recent to oldest (top = newest).
Configure the Application:
- Go to the Configurations tab.
- Add, edit, or delete custom prompts.
- Manage Gemini models: add new models or delete existing ones.
- Configure languages and download or upload the complete configuration.
Convert TXT to PDF:
- Go to the TXT to PDF tab.
- Select a TXT file (for example, one generated by AI mode).
- Click Convert to PDF to get a clean, paginated PDF.

How to Run ocrAI

Prerequisites

Docker
Docker Compose

Build and Run

docker-compose up --build

Then, open your browser at http://localhost:5015 to start using ocrAI.

Technologies Used
Frontend: React, Axios
Backend: Flask, Python
OCR: Tesseract, pdf2image, OCRmyPDF
AI: Gemini API
Containerization: Docker, Docker Compose

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
frontend		frontend
.DS_Store		.DS_Store
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose-tests.yml		docker-compose-tests.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ocrAI 🤖

Key Features

How to Use the Application

How to Run ocrAI

Prerequisites

Build and Run

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

License

Drakonis96/ocrai

Folders and files

Latest commit

History

Repository files navigation

ocrAI 🤖

Key Features

How to Use the Application

How to Run ocrAI

Prerequisites

Build and Run

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

Packages