Skip to content

[Bug]: Certain PDF crashes RAG pipeline #98

@rjakomin

Description

@rjakomin

Steps to reproduce

Hi,
whenever I try to copy the attached PDF file to my data folder monitored by the private RAG pipeline, it crashes the engine (the whole app docker container) without any error message. It happens every time for this pdf document: https://www.dancilla.com/PDF/Dancilla_alle_Volkstaenze.pdf

Relevant log output

2025-02-25 14:59:40 pathway_engine.connectors.monitoring INFO FileSystem(data): 1 entries (3530 minibatch(es)) have been sent to the engine
2025-02-25 15:00:32 root INFO {"_type": "request_payload", "session_id": "uuid-29d4de7b-6cd3-4f92-ab6e-5111029c3157", "payload": {}}
2025-02-25 15:00:37 root INFO {"_type": "request_payload", "session_id": "uuid-65828918-7085-4c99-8385-7a9c93320895", "payload": {}}
2025-02-25 15:00:44 pathway_engine.connectors.monitoring INFO FileSystem(data): 0 entries (1 minibatch(es)) have been sent to the engine
2025-02-25 15:00:44 pathway_engine.connectors.monitoring INFO PythonReader: 2 entries (87119 minibatch(es)) have been sent to the engine
2025-02-25 15:01:02 pathway_engine.connectors.monitoring INFO PythonReader: 0 entries (5 minibatch(es)) have been sent to the engine

What did you expect to happen?

Processing of the newly copied PDF file and including its content into the vector database used by the private RAG.

Version

current

Docker Versions (if used)

27.4.0, build bde2b89 (running on Windows 11)

OS

Windows 11

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions