One yml docker automation #1524
Replies: 1 comment
-
Tinkered again today, found the solution: I used and underscore when it should have been a minus duh. The Below I can recommend for any synology users (I tested with): version: "3.3"
This will take any(?) pdf from folder "scan" and apply ocr if necessary and output a pdf/a to folder "output" and safe the original copy to "orig". General additional helpful hint for synology users: your GUID might not be the number behind gid, but might be instead the last number you see when you ssh -id, try this and see if it works if your app seems to not work. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone, I am trying to get everything ocr'ed with help of my nas and OCRmyPDF. Everything runs except when the pdf is already ocr'ed, due to being created pdfs (not scanned). It worked until I added the "skip text" rule, which I think should solve the issue. The reason or "workflow" intention is to feed every pdf into a consumption folder, get it processed, and then use the pdf/a with ocr for continued work, basically ensuring every pdf I use is a pdf/a with an ocr layer. This little gem of a program seem tailor made for it, so I tried to set up my yml accordingly. Why with yml and not an additional script? Because I am not that tech savvy and would not know how else to set up the docker program on my nas to do what I intend.
Here is what I tried and it works if I leave the "skip_text" argument out:
version: "3.3"
services:
ocrmypdf:
restart: unless-stopped
container_name: ocrmypdf
image: jbarlow83/ocrmypdf:latest
volumes:
- "/volume1/docker/ocrmypdf/scan:/input"
- "/volume1/docker/ocrmypdf/pfd:/output"
- "/volume1/docker/ocrmypdf/orig:/processed"
environment:
- OCR_ON_SUCCESS_ARCHIVE=1
- OCR_DESKEW=1
- 'OCR_JSON_SETTINGS={"skip_text": true, "l": "deu", "optimize": "1"}'
- OCR_OUTPUT_DIRECTORY_YEAR_MONTH=0
user: "1000:1000"
entrypoint: python3
command: watcher.py
My main problem was finding the allowed JSON SETTINGS. I found a reverence to skip_text and that it is a boolean here https://ocrmypdf.readthedocs.io/en/latest/apiref.html#ocrmypdf.api.ocr , but it seems not to work and I can't figure out why.
If I should have posted elsewhere pls tell me where, I found nowhere else.
Thanks for any and all help in advance.
edit: added additional information / changed phrasing
Beta Was this translation helpful? Give feedback.
All reactions