-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Hi OCRmyPDF team,
First, thanks for this excellent tool! I'm looking to build a persistent Docker service based on OCRmyPDF and have some questions.
My Use Case
I want to create a persistent Docker service (not ephemeral) that:
Runs as a long-lived container with a REST API
Handles multiple concurrent OCR requests with queuing
Will be published as open-source on Docker Hub and GitHub
Built with Quarkus (Java) that calls OCRmyPDF as the OCR engine
My Questions
Fork vs. Extension: Should I:
Fork the repository and modify it?
Build on top of your Docker image (FROM jbarlow83/ocrmypdf-alpine)?
Build a separate service that calls OCRmyPDF via Docker/CLI?
Production web service: Your documentation mentions that the included webservice.py is for demo/dev purposes only. Are there specific concerns or recommendations for building a production-grade service?
Attribution: What's the appropriate way to credit OCRmyPDF in my derivative work?
My Plan
Build a production-ready Quarkus-based service that adds:
RESTful HTTP API with production-grade server
Job queue for concurrent request handling
Health checks and monitoring
Proper error handling and logging
Calls OCRmyPDF for the actual OCR processing
The wrapper code would be fully open-source under a compatible license (MPL-2.0 or AGPL-3.0).
Is this approach acceptable? Any guidance would be appreciated!
Thanks!