Docsplit.extract_text generates a String with a null byte

Hello,

First of all, thank you for the gem.

Second, I currently have a pdf that, when put through `Docsplit.extract_text`, it creates a file with a null byte character. Shouldn't this be handled by `TextCleaner#clean`? Or do you think that the issue is within `pdftotext`/`tesseract`?

Unfortunately, the pdf that I am using is from a client and I can't provide it. I also haven't been able to manually create one that causes this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Docsplit.extract_text generates a String with a null byte #152

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Docsplit.extract_text generates a String with a null byte #152

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions