Skip to content

ocrAI is an app that combines OCR and Artificial Intelligence to process and translate documents. It embeds the AI-recognized text directly into the original PDF, enabling text selection and searchability. It can also generate a translated version of the PDF.

License

Notifications You must be signed in to change notification settings

Drakonis96/ocrai

Repository files navigation

ocrAI Logo

ocrAI 🤖


🆕 Enhanced OCR + AI Mode: The AI-corrected text is now embedded directly into the PDF, preserving the original layout and positioning within the document.

📄 New Output Format in Full AI OCR Mode: The output is now a Markdown-formatted .txt file, structured with titles, subtitles, and page markers. This file can be converted directly within the app to a clean, paginated PDF, making it ideal for tasks like translating books while preserving formatting and pagination.


ocrAI is a simple web app that combines Optical Character Recognition (OCR) and Artificial Intelligence (AI) to process and translate documents. It offers an intuitive interface with real-time feedback and emoji progress.

Key Features

  • File Management 📤

    • Upload PDF or image files easily.
    • Files are saved with unique names to avoid overwrites.
    • Delete all files with one click (removes from both uploads and outputs).
  • OCR Processing Modes 🔍

    • OCR (Tesseract Only):
      Extracts text with Tesseract and embeds it into the PDF using OCRmyPDF. The TXT file contains the raw OCR output.
    • OCR + AI (Tesseract + AI):
      Uses Tesseract to extract text, then sends it to Gemini AI for correction and formatting. The corrected text is embedded directly into the resulting PDF, and a TXT file is also generated.
    • AI (Full AI OCR):
      Uses Gemini AI to process the document page by page. Generates a TXT file in markdown format (with page markers and structure). This TXT can be converted to a PDF using the TXT to PDF tool in the app.
    • All modes show real-time progress with emojis and run in the background.
  • TXT to PDF 📝

    • Convert any TXT file (especially markdown from AI mode) to a clean, paginated PDF directly from the app.
  • Translation 🌐

    • Translate PDF or TXT documents page by page.
    • Progress updates are displayed, and a TXT file with the final translation (including page markers) is generated.
  • Configuration ⚙️

    • Manage and add new Gemini AI models and languages.
    • Update or add custom prompts for OCR, correction, and translation.
    • Download or upload the complete configuration (prompts and models).

How to Use the Application

  1. Upload and Process Files:

    • Go to the OCR tab.
    • Select your file (PDF or image).
    • Choose one of the processing modes:
      • OCR (Tesseract Only)
      • OCR + AI (Tesseract + AI for correction and AI-embedded PDF)
      • AI (Full AI OCR, generates markdown TXT)
    • For AI modes, select the desired prompt.
    • Click Upload and process and watch the real-time progress.
    • If you used AI mode, you can convert the resulting TXT to PDF using the TXT to PDF tab.
  2. Translate Documents:

    • Go to the Translation tab.
    • Upload a new file or select one from the list of processed files.
    • Choose the target language and translation prompt.
    • Click Translate and observe the progress as each page is processed.
    • The result is saved in a TXT file with page markers.
  3. View Processed Files:

    • Go to the Processed Files tab.
    • Download or delete files (with confirmation prompts).

    Note: The Processed Files list is ordered from most recent to oldest (top = newest).

  4. Configure the Application:

    • Go to the Configurations tab.
    • Add, edit, or delete custom prompts.
    • Manage Gemini models: add new models or delete existing ones.
    • Configure languages and download or upload the complete configuration.
  5. Convert TXT to PDF:

    • Go to the TXT to PDF tab.
    • Select a TXT file (for example, one generated by AI mode).
    • Click Convert to PDF to get a clean, paginated PDF.

How to Run ocrAI

Prerequisites

  • Docker
  • Docker Compose

Build and Run

docker-compose up --build

Then, open your browser at http://localhost:5015 to start using ocrAI.


Technologies Used
Frontend: React, Axios
Backend: Flask, Python
OCR: Tesseract, pdf2image, OCRmyPDF
AI: Gemini API
Containerization: Docker, Docker Compose

About

ocrAI is an app that combines OCR and Artificial Intelligence to process and translate documents. It embeds the AI-recognized text directly into the original PDF, enabling text selection and searchability. It can also generate a translated version of the PDF.

Resources

License

Stars

Watchers

Forks

Packages

No packages published