Skip to content

Question: Building a persistent service container based on OCRmyPDF #1581

@juanmarques

Description

@juanmarques

Hi OCRmyPDF team,

First, thanks for this excellent tool! I'm looking to build a persistent Docker service based on OCRmyPDF and have some questions.

My Use Case
I want to create a persistent Docker service (not ephemeral) that:

Runs as a long-lived container with a REST API
Handles multiple concurrent OCR requests with queuing
Will be published as open-source on Docker Hub and GitHub
Built with Quarkus (Java) that calls OCRmyPDF as the OCR engine

My Questions

Fork vs. Extension: Should I:

Fork the repository and modify it?
Build on top of your Docker image (FROM jbarlow83/ocrmypdf-alpine)?
Build a separate service that calls OCRmyPDF via Docker/CLI?

Production web service: Your documentation mentions that the included webservice.py is for demo/dev purposes only. Are there specific concerns or recommendations for building a production-grade service?
Attribution: What's the appropriate way to credit OCRmyPDF in my derivative work?

My Plan
Build a production-ready Quarkus-based service that adds:

RESTful HTTP API with production-grade server
Job queue for concurrent request handling
Health checks and monitoring
Proper error handling and logging
Calls OCRmyPDF for the actual OCR processing

The wrapper code would be fully open-source under a compatible license (MPL-2.0 or AGPL-3.0).
Is this approach acceptable? Any guidance would be appreciated!
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions