PDF text extraction #471

lukestanley · 2024-05-15T09:04:49Z

We need a way to get text data from PDFs using directly accessible text or OCR.
Using PDF.js is probably a good way to get the pages with text content, or the image content of the pages.
We need to investigate a way to provide PDFs from local files to the web app running in the browser, and store the text locally in a way that works well with the app, and explore a good form for this, possibly as a new block or maybe an extension of an existing block.
Reviewing the file input block and how PDF.js should be loaded as a dependency are good next steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF text extraction #471

PDF text extraction #471

lukestanley commented May 15, 2024

PDF text extraction #471

PDF text extraction #471

Comments

lukestanley commented May 15, 2024