-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
I am trying to use pdfbox
, with this vanilla snippet:
converter = pdfbox.PDFBox()
converter.extract_text(
input_path=str(pdf.absolute()),
output_path=str(txt.absolute()))
But it becomes stuck. I debugged the stack tree, and it hangs at this line:
I confirmed that a Java process is spawned:
➜ jps
5416 Jps
5385
329 <-- spawned process
But it is just stuck there.
Running the cached jar by python-pdfbox
in the terminal works:
java -jar pdfbox-app-2.0.17.jar ExtractText '/Users/devcsrj/Projects/devcsrj/klerk/dist/17/SENATE/regular-1/journal-28.pdf' '/Users/devcsrj/Projects/devcsrj/klerk/dist/17/SENATE/regular-1/journal-28.txt'
So I am no longer sure what's going on. Thoughts?
Environment
Python
python-pdfbox = "==0.1.7"
python_version = "3.7"
Java
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-20190711112007.graal.jdk8u-src-tar-gz-b08)
OpenJDK 64-Bit GraalVM CE 19.2.0 (build 25.222-b08-jvmci-19.2-b02, mixed mode)
OS
macOS Mojave 10.14.4
adarsa and suiyuan2009bdewilde
Metadata
Metadata
Assignees
Labels
No labels