Skip to content

Commit 1085192

Browse files
committed
add docs with not-preserved components
Signed-off-by: Michele Dolfi <[email protected]>
1 parent 803e1e5 commit 1085192

File tree

1 file changed

+10
-1
lines changed

1 file changed

+10
-1
lines changed

docling_core/utils/legacy.py

+10-1
Original file line numberDiff line numberDiff line change
@@ -350,7 +350,16 @@ def _make_spans(cell: TableCell, table_item: TableItem):
350350

351351

352352
def legacy_to_docling_document(legacy_doc: DsDocument) -> DoclingDocument: # noqa: C901
353-
"""Convert a legacy document to DoclingDocument."""
353+
"""Convert a legacy document to DoclingDocument.
354+
355+
It is known that the following content will not be preserved in the transformation:
356+
- name of labels (upper vs lower case)
357+
- caption of figures are not in main-text anymore
358+
- s3_data removed
359+
- model metadata removed
360+
- logs removed
361+
- document hash cannot be preserved
362+
"""
354363

355364
def _transform_prov(item: BaseCell) -> Optional[ProvenanceItem]:
356365
"""Create a new provenance from a legacy item."""

0 commit comments

Comments
 (0)