You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
If you assign the meta field post initialization to a Document, the id of the document doesn't get updated.
This is e.g. done in the PyPDFConverter.
Documents having the same ID although they have different metadata leads to issues with document stores and duplicate policy OVERWRITE as all documents end up as the same document then and even overwrite each other.
Error message
Error that was thrown (if available)
Expected behavior
The ID should update itself if the metadata is changed. Same applies to the other properties.
Additional context
Ideally we find a solution that the ID is automatically updated but also can be overridden manually?
julian-risch
added
P2
Medium priority, add to the next sprint if no P1 available
and removed
P0
Highest priority, add to the current sprint
labels
Jan 13, 2025
With #8698 and #8708 merged, the immediate issue was addressed. Before closing this issue, we should check whether it makes sense to emit a warning when a document's attribute is updated saying that the id is no re-created. At least for content and metadata I think it makes sense. Not sure about embedding.
julian-risch
added
P3
Low priority, leave it in the backlog
and removed
P2
Medium priority, add to the next sprint if no P1 available
labels
Jan 20, 2025
Describe the bug
If you assign the
meta
field post initialization to aDocument
, the id of the document doesn't get updated.This is e.g. done in the PyPDFConverter.
Documents having the same ID although they have different metadata leads to issues with document stores and duplicate policy
OVERWRITE
as all documents end up as the same document then and even overwrite each other.Error message
Error that was thrown (if available)
Expected behavior
The ID should update itself if the metadata is changed. Same applies to the other properties.
Additional context
Ideally we find a solution that the ID is automatically updated but also can be overridden manually?
To Reproduce
FAQ Check
System:
The text was updated successfully, but these errors were encountered: