Open
Description
I actually run your code: 01_semi_structured_data.ipynb in collab
from typing import Any
from pydantic import BaseModel
from unstructured.partition.pdf import partition_pdf
raw_pdf_elements = partition_pdf(
filename="statement_of_changes.pdf",
extract_images_in_pdf=False,
infer_table_structure=True,
chunking_strategy="by_title",
max_characters=4000,
new_after_n_chars=3800,
combine_text_under_n_chars=2000,
image_output_dir_path=".",
)
and got error shows
WARNING:unstructured:This function will be deprecated in a future release and `unstructured` will simply use the DEFAULT_MODEL from `unstructured_inference.model.base` to set default model name
---------------------------------------------------------------------------
UnidentifiedImageError Traceback (most recent call last)
[<ipython-input-10-c47946c825bc>](https://localhost:8080/#) in <cell line: 6>()
4 from unstructured.partition.pdf import partition_pdf
5
----> 6 raw_pdf_elements = partition_pdf(
7 filename="statement_of_changes.pdf",
8 extract_images_in_pdf=False,
10 frames
[/usr/local/lib/python3.10/dist-packages/PIL/Image.py](https://localhost:8080/#) in open(fp, mode, formats)
3281 fp.seek(0)
3282 except (AttributeError, io.UnsupportedOperation):
-> 3283 fp = io.BytesIO(fp.read())
3284 exclusive_fp = True
3285
UnidentifiedImageError: cannot identify image file '/tmp/tmpt9l2pd51/88be9f82-5a19-4ec0-baa1-a029cf45dfc4-1.ppm'
I have no idea how to resolve it.
Metadata
Metadata
Assignees
Labels
No labels