One yml docker automation #1524

CaipaRinha · 2025-05-05T16:05:53Z

CaipaRinha
May 5, 2025

Hello everyone, I am trying to get everything ocr'ed with help of my nas and OCRmyPDF. Everything runs except when the pdf is already ocr'ed, due to being created pdfs (not scanned). It worked until I added the "skip text" rule, which I think should solve the issue. The reason or "workflow" intention is to feed every pdf into a consumption folder, get it processed, and then use the pdf/a with ocr for continued work, basically ensuring every pdf I use is a pdf/a with an ocr layer. This little gem of a program seem tailor made for it, so I tried to set up my yml accordingly. Why with yml and not an additional script? Because I am not that tech savvy and would not know how else to set up the docker program on my nas to do what I intend.

Here is what I tried and it works if I leave the "skip_text" argument out:

version: "3.3"
services:
ocrmypdf:
restart: unless-stopped
container_name: ocrmypdf
image: jbarlow83/ocrmypdf:latest
volumes:
- "/volume1/docker/ocrmypdf/scan:/input"
- "/volume1/docker/ocrmypdf/pfd:/output"
- "/volume1/docker/ocrmypdf/orig:/processed"
environment:
- OCR_ON_SUCCESS_ARCHIVE=1
- OCR_DESKEW=1
- 'OCR_JSON_SETTINGS={"skip_text": true, "l": "deu", "optimize": "1"}'
- OCR_OUTPUT_DIRECTORY_YEAR_MONTH=0
user: "1000:1000"
entrypoint: python3
command: watcher.py

My main problem was finding the allowed JSON SETTINGS. I found a reverence to skip_text and that it is a boolean here https://ocrmypdf.readthedocs.io/en/latest/apiref.html#ocrmypdf.api.ocr , but it seems not to work and I can't figure out why.
If I should have posted elsewhere pls tell me where, I found nowhere else.
Thanks for any and all help in advance.

edit: added additional information / changed phrasing

CaipaRinha · 2025-05-16T12:09:12Z

CaipaRinha
May 16, 2025
Author

Tinkered again today, found the solution: I used and underscore when it should have been a minus duh. The Below I can recommend for any synology users (I tested with):

version: "3.3"
services:
ocrmypdf:
restart: unless-stopped
container_name: ocrmypdf
image: jbarlow83/ocrmypdf:latest
volumes:

"/volume1/docker/ocrmypdf/scan:/input"
"/volume1/docker/ocrmypdf/pfd:/output"
"/volume1/docker/ocrmypdf/orig:/processed"
environment:
OCR_ON_SUCCESS_ARCHIVE=1
OCR_DESKEW=1
'OCR_JSON_SETTINGS={"skip-text": true, "l": "deu", "optimize": "1"}'
OCR_OUTPUT_DIRECTORY_YEAR_MONTH=0
user: "1000:1000"
entrypoint: python3
command: watcher.py

This will take any(?) pdf from folder "scan" and apply ocr if necessary and output a pdf/a to folder "output" and safe the original copy to "orig".

General additional helpful hint for synology users: your GUID might not be the number behind gid, but might be instead the last number you see when you ssh -id, try this and see if it works if your app seems to not work.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

One yml docker automation #1524

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

One yml docker automation #1524

Uh oh!

Uh oh!

CaipaRinha May 5, 2025

Replies: 1 comment

Uh oh!

CaipaRinha May 16, 2025 Author

CaipaRinha
May 5, 2025

CaipaRinha
May 16, 2025
Author